From: Ackerley Tng <ackerleytng@google.com>
To: Fuad Tabba <tabba@google.com>
Cc: kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org,
linux-mm@kvack.org, pbonzini@redhat.com, chenhuacai@kernel.org,
mpe@ellerman.id.au, anup@brainfault.org,
paul.walmsley@sifive.com, palmer@dabbelt.com,
aou@eecs.berkeley.edu, seanjc@google.com,
viro@zeniv.linux.org.uk, brauner@kernel.org,
willy@infradead.org, akpm@linux-foundation.org,
xiaoyao.li@intel.com, yilun.xu@intel.com,
chao.p.peng@linux.intel.com, jarkko@kernel.org,
amoorthy@google.com, dmatlack@google.com,
yu.c.zhang@linux.intel.com, isaku.yamahata@intel.com,
mic@digikod.net, vbabka@suse.cz, vannapurve@google.com,
mail@maciej.szmigiero.name, david@redhat.com,
michael.roth@amd.com, wei.w.wang@intel.com,
liam.merwick@oracle.com, isaku.yamahata@gmail.com,
kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com,
steven.price@arm.com, quic_eberman@quicinc.com,
quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com,
quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com,
quic_pderrin@quicinc.com, quic_pheragu@quicinc.com,
catalin.marinas@arm.com, james.morse@arm.com,
yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org,
will@kernel.org, qperret@google.com, keirf@google.com,
roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org,
jgg@nvidia.com, rientjes@google.com, jhubbard@nvidia.com,
fvdl@google.com, hughd@google.com, jthoughton@google.com,
tabba@google.com
Subject: Re: [RFC PATCH v5 06/15] KVM: guest_memfd: Handle final folio_put() of guestmem pages
Date: Thu, 06 Feb 2025 03:37:06 +0000 [thread overview]
Message-ID: <diqz1pwbspzx.fsf@ackerleytng-ctop.c.googlers.com> (raw)
In-Reply-To: <20250117163001.2326672-7-tabba@google.com> (message from Fuad Tabba on Fri, 17 Jan 2025 16:29:52 +0000)
Fuad Tabba <tabba@google.com> writes:
> Before transitioning a guest_memfd folio to unshared, thereby
> disallowing access by the host and allowing the hypervisor to
> transition its view of the guest page as private, we need to be
> sure that the host doesn't have any references to the folio.
>
> This patch introduces a new type for guest_memfd folios, and uses
> that to register a callback that informs the guest_memfd
> subsystem when the last reference is dropped, therefore knowing
> that the host doesn't have any remaining references.
>
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---
> The function kvm_slot_gmem_register_callback() isn't used in this
> series. It will be used later in code that performs unsharing of
> memory. I have tested it with pKVM, based on downstream code [*].
> It's included in this RFC since it demonstrates the plan to
> handle unsharing of private folios.
>
> [*] https://android-kvm.googlesource.com/linux/+/refs/heads/tabba/guestmem-6.13-v5-pkvm
> ---
> include/linux/kvm_host.h | 11 +++
> include/linux/page-flags.h | 7 ++
> mm/debug.c | 1 +
> mm/swap.c | 4 +
> virt/kvm/guest_memfd.c | 145 +++++++++++++++++++++++++++++++++++++
> 5 files changed, 168 insertions(+)
>
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 84aa7908a5dd..63e6d6dd98b3 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -2574,6 +2574,8 @@ int kvm_slot_gmem_clear_mappable(struct kvm_memory_slot *slot, gfn_t start,
> gfn_t end);
> bool kvm_slot_gmem_is_mappable(struct kvm_memory_slot *slot, gfn_t gfn);
> bool kvm_slot_gmem_is_guest_mappable(struct kvm_memory_slot *slot, gfn_t gfn);
> +int kvm_slot_gmem_register_callback(struct kvm_memory_slot *slot, gfn_t gfn);
> +void kvm_gmem_handle_folio_put(struct folio *folio);
> #else
> static inline bool kvm_gmem_is_mappable(struct kvm *kvm, gfn_t gfn, gfn_t end)
> {
> @@ -2615,6 +2617,15 @@ static inline bool kvm_slot_gmem_is_guest_mappable(struct kvm_memory_slot *slot,
> WARN_ON_ONCE(1);
> return false;
> }
> +static inline int kvm_slot_gmem_register_callback(struct kvm_memory_slot *slot, gfn_t gfn)
> +{
> + WARN_ON_ONCE(1);
> + return -EINVAL;
> +}
> +static inline void kvm_gmem_handle_folio_put(struct folio *folio)
> +{
> + WARN_ON_ONCE(1);
> +}
> #endif /* CONFIG_KVM_GMEM_MAPPABLE */
>
> #endif
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 6615f2f59144..bab3cac1f93b 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -942,6 +942,7 @@ enum pagetype {
> PGTY_slab = 0xf5,
> PGTY_zsmalloc = 0xf6,
> PGTY_unaccepted = 0xf7,
> + PGTY_guestmem = 0xf8,
>
> PGTY_mapcount_underflow = 0xff
> };
> @@ -1091,6 +1092,12 @@ FOLIO_TYPE_OPS(hugetlb, hugetlb)
> FOLIO_TEST_FLAG_FALSE(hugetlb)
> #endif
>
> +#ifdef CONFIG_KVM_GMEM_MAPPABLE
> +FOLIO_TYPE_OPS(guestmem, guestmem)
> +#else
> +FOLIO_TEST_FLAG_FALSE(guestmem)
> +#endif
> +
> PAGE_TYPE_OPS(Zsmalloc, zsmalloc, zsmalloc)
>
> /*
> diff --git a/mm/debug.c b/mm/debug.c
> index 95b6ab809c0e..db93be385ed9 100644
> --- a/mm/debug.c
> +++ b/mm/debug.c
> @@ -56,6 +56,7 @@ static const char *page_type_names[] = {
> DEF_PAGETYPE_NAME(table),
> DEF_PAGETYPE_NAME(buddy),
> DEF_PAGETYPE_NAME(unaccepted),
> + DEF_PAGETYPE_NAME(guestmem),
> };
>
> static const char *page_type_name(unsigned int page_type)
> diff --git a/mm/swap.c b/mm/swap.c
> index 6f01b56bce13..15220eaabc86 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -37,6 +37,7 @@
> #include <linux/page_idle.h>
> #include <linux/local_lock.h>
> #include <linux/buffer_head.h>
> +#include <linux/kvm_host.h>
>
> #include "internal.h"
>
> @@ -103,6 +104,9 @@ static void free_typed_folio(struct folio *folio)
> case PGTY_offline:
> /* Nothing to do, it's offline. */
> return;
> + case PGTY_guestmem:
> + kvm_gmem_handle_folio_put(folio);
> + return;
> default:
> WARN_ON_ONCE(1);
> }
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index d1c192927cf7..722afd9f8742 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -387,6 +387,28 @@ enum folio_mappability {
> KVM_GMEM_NONE_MAPPABLE = 0b11, /* Not mappable, transient state. */
> };
>
> +/*
> + * Unregisters the __folio_put() callback from the folio.
> + *
> + * Restores a folio's refcount after all pending references have been released,
> + * and removes the folio type, thereby removing the callback. Now the folio can
> + * be freed normaly once all actual references have been dropped.
> + *
> + * Must be called with the filemap (inode->i_mapping) invalidate_lock held.
> + * Must also have exclusive access to the folio: folio must be either locked, or
> + * gmem holds the only reference.
> + */
> +static void __kvm_gmem_restore_pending_folio(struct folio *folio)
> +{
> + if (WARN_ON_ONCE(folio_mapped(folio) || !folio_test_guestmem(folio)))
> + return;
> +
> + WARN_ON_ONCE(!folio_test_locked(folio) && folio_ref_count(folio) > 1);
> +
> + __folio_clear_guestmem(folio);
> + folio_ref_add(folio, folio_nr_pages(folio));
> +}
> +
> /*
> * Marks the range [start, end) as mappable by both the host and the guest.
> * Usually called when guest shares memory with the host.
> @@ -400,7 +422,31 @@ static int gmem_set_mappable(struct inode *inode, pgoff_t start, pgoff_t end)
>
> filemap_invalidate_lock(inode->i_mapping);
> for (i = start; i < end; i++) {
> + struct folio *folio = NULL;
> +
> + /*
> + * If the folio is NONE_MAPPABLE, it indicates that it is
> + * transitioning to private (GUEST_MAPPABLE). Transition it to
> + * shared (ALL_MAPPABLE) immediately, and remove the callback.
> + */
> + if (xa_to_value(xa_load(mappable_offsets, i)) == KVM_GMEM_NONE_MAPPABLE) {
> + folio = filemap_lock_folio(inode->i_mapping, i);
> + if (WARN_ON_ONCE(IS_ERR(folio))) {
> + r = PTR_ERR(folio);
> + break;
> + }
> +
> + if (folio_test_guestmem(folio))
> + __kvm_gmem_restore_pending_folio(folio);
> + }
> +
> r = xa_err(xa_store(mappable_offsets, i, xval, GFP_KERNEL));
> +
> + if (folio) {
> + folio_unlock(folio);
> + folio_put(folio);
> + }
> +
> if (r)
> break;
> }
> @@ -473,6 +519,105 @@ static int gmem_clear_mappable(struct inode *inode, pgoff_t start, pgoff_t end)
> return r;
> }
>
I think one of these functions to restore mappability needs to be called
to restore the refcounts on truncation. Without doing this, the
refcounts on the folios at truncation time would only be the
transient/speculative ones, and truncating will take off the filemap
refcounts which were already taken off to set up the folio_put()
callback.
Should mappability can be restored according to
GUEST_MEMFD_FLAG_INIT_MAPPABLE? Or should mappability of NONE be
restored to GUEST and mappability of ALL left as ALL?
> +/*
> + * Registers a callback to __folio_put(), so that gmem knows that the host does
> + * not have any references to the folio. It does that by setting the folio type
> + * to guestmem.
> + *
> + * Returns 0 if the host doesn't have any references, or -EAGAIN if the host
> + * has references, and the callback has been registered.
> + *
> + * Must be called with the following locks held:
> + * - filemap (inode->i_mapping) invalidate_lock
> + * - folio lock
> + */
> +static int __gmem_register_callback(struct folio *folio, struct inode *inode, pgoff_t idx)
> +{
> + struct xarray *mappable_offsets = &kvm_gmem_private(inode)->mappable_offsets;
> + void *xval_guest = xa_mk_value(KVM_GMEM_GUEST_MAPPABLE);
> + int refcount;
> +
> + rwsem_assert_held_write_nolockdep(&inode->i_mapping->invalidate_lock);
> + WARN_ON_ONCE(!folio_test_locked(folio));
> +
> + if (folio_mapped(folio) || folio_test_guestmem(folio))
> + return -EAGAIN;
> +
> + /* Register a callback first. */
> + __folio_set_guestmem(folio);
> +
> + /*
> + * Check for references after setting the type to guestmem, to guard
> + * against potential races with the refcount being decremented later.
> + *
> + * At least one reference is expected because the folio is locked.
> + */
> +
> + refcount = folio_ref_sub_return(folio, folio_nr_pages(folio));
> + if (refcount == 1) {
> + int r;
> +
> + /* refcount isn't elevated, it's now faultable by the guest. */
> + r = WARN_ON_ONCE(xa_err(xa_store(mappable_offsets, idx, xval_guest, GFP_KERNEL)));
> + if (!r)
> + __kvm_gmem_restore_pending_folio(folio);
> +
> + return r;
> + }
> +
> + return -EAGAIN;
> +}
> +
> +int kvm_slot_gmem_register_callback(struct kvm_memory_slot *slot, gfn_t gfn)
> +{
> + unsigned long pgoff = slot->gmem.pgoff + gfn - slot->base_gfn;
> + struct inode *inode = file_inode(slot->gmem.file);
> + struct folio *folio;
> + int r;
> +
> + filemap_invalidate_lock(inode->i_mapping);
> +
> + folio = filemap_lock_folio(inode->i_mapping, pgoff);
> + if (WARN_ON_ONCE(IS_ERR(folio))) {
> + r = PTR_ERR(folio);
> + goto out;
> + }
> +
> + r = __gmem_register_callback(folio, inode, pgoff);
> +
> + folio_unlock(folio);
> + folio_put(folio);
> +out:
> + filemap_invalidate_unlock(inode->i_mapping);
> +
> + return r;
> +}
> +
> +/*
> + * Callback function for __folio_put(), i.e., called when all references by the
> + * host to the folio have been dropped. This allows gmem to transition the state
> + * of the folio to mappable by the guest, and allows the hypervisor to continue
> + * transitioning its state to private, since the host cannot attempt to access
> + * it anymore.
> + */
> +void kvm_gmem_handle_folio_put(struct folio *folio)
> +{
> + struct xarray *mappable_offsets;
> + struct inode *inode;
> + pgoff_t index;
> + void *xval;
> +
> + inode = folio->mapping->host;
> + index = folio->index;
> + mappable_offsets = &kvm_gmem_private(inode)->mappable_offsets;
> + xval = xa_mk_value(KVM_GMEM_GUEST_MAPPABLE);
> +
> + filemap_invalidate_lock(inode->i_mapping);
> + __kvm_gmem_restore_pending_folio(folio);
> + WARN_ON_ONCE(xa_err(xa_store(mappable_offsets, index, xval, GFP_KERNEL)));
> + filemap_invalidate_unlock(inode->i_mapping);
> +}
> +
> static bool gmem_is_mappable(struct inode *inode, pgoff_t pgoff)
> {
> struct xarray *mappable_offsets = &kvm_gmem_private(inode)->mappable_offsets;
next prev parent reply other threads:[~2025-02-06 3:37 UTC|newest]
Thread overview: 60+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-17 16:29 [RFC PATCH v5 00/15] KVM: Restricted mapping of guest_memfd at the host and arm64 support Fuad Tabba
2025-01-17 16:29 ` [RFC PATCH v5 01/15] mm: Consolidate freeing of typed folios on final folio_put() Fuad Tabba
2025-01-17 22:05 ` Elliot Berman
2025-01-19 14:39 ` Fuad Tabba
2025-01-20 10:39 ` David Hildenbrand
2025-01-20 10:50 ` Fuad Tabba
2025-01-20 10:39 ` David Hildenbrand
2025-01-20 10:43 ` Fuad Tabba
2025-01-20 10:43 ` Vlastimil Babka
2025-01-20 11:12 ` Vlastimil Babka
2025-01-20 11:28 ` David Hildenbrand
2025-01-17 16:29 ` [RFC PATCH v5 02/15] KVM: guest_memfd: Make guest mem use guest mem inodes instead of anonymous inodes Fuad Tabba
2025-01-24 4:25 ` Gavin Shan
2025-01-29 10:12 ` Fuad Tabba
2025-02-11 15:58 ` Ackerley Tng
2025-01-17 16:29 ` [RFC PATCH v5 03/15] KVM: guest_memfd: Introduce kvm_gmem_get_pfn_locked(), which retains the folio lock Fuad Tabba
2025-01-17 16:29 ` [RFC PATCH v5 04/15] KVM: guest_memfd: Track mappability within a struct kvm_gmem_private Fuad Tabba
2025-01-24 5:31 ` Gavin Shan
2025-01-29 10:15 ` Fuad Tabba
2025-02-26 22:29 ` Ackerley Tng
2025-01-17 16:29 ` [RFC PATCH v5 05/15] KVM: guest_memfd: Folio mappability states and functions that manage their transition Fuad Tabba
2025-01-20 10:30 ` Kirill A. Shutemov
2025-01-20 10:40 ` Fuad Tabba
2025-02-06 3:14 ` Ackerley Tng
2025-02-06 9:45 ` Fuad Tabba
2025-02-19 23:33 ` Ackerley Tng
2025-02-20 9:26 ` Fuad Tabba
2025-01-17 16:29 ` [RFC PATCH v5 06/15] KVM: guest_memfd: Handle final folio_put() of guestmem pages Fuad Tabba
2025-01-20 11:37 ` Vlastimil Babka
2025-01-20 12:14 ` Fuad Tabba
2025-01-22 22:24 ` Ackerley Tng
2025-01-23 11:00 ` Fuad Tabba
2025-02-06 3:18 ` Ackerley Tng
2025-02-06 3:28 ` Ackerley Tng
2025-02-06 9:47 ` Fuad Tabba
2025-01-30 14:23 ` Fuad Tabba
2025-01-22 22:16 ` Ackerley Tng
2025-01-23 9:50 ` Fuad Tabba
2025-02-05 1:28 ` Vishal Annapurve
2025-02-05 4:31 ` Ackerley Tng
2025-02-05 5:58 ` Vishal Annapurve
2025-02-05 0:42 ` Vishal Annapurve
2025-02-05 10:06 ` Fuad Tabba
2025-02-05 17:39 ` Vishal Annapurve
2025-02-05 17:42 ` Vishal Annapurve
2025-02-07 10:46 ` Ackerley Tng
2025-02-10 16:04 ` Fuad Tabba
2025-02-05 0:51 ` Vishal Annapurve
2025-02-05 10:07 ` Fuad Tabba
2025-02-06 3:37 ` Ackerley Tng [this message]
2025-02-06 9:49 ` Fuad Tabba
2025-01-17 16:29 ` [RFC PATCH v5 07/15] KVM: guest_memfd: Allow host to mmap guest_memfd() pages when shared Fuad Tabba
2025-01-17 16:29 ` [RFC PATCH v5 08/15] KVM: guest_memfd: Add guest_memfd support to kvm_(read|/write)_guest_page() Fuad Tabba
2025-01-17 16:29 ` [RFC PATCH v5 09/15] KVM: guest_memfd: Add KVM capability to check if guest_memfd is host mappable Fuad Tabba
2025-01-17 16:29 ` [RFC PATCH v5 10/15] KVM: guest_memfd: Add a guest_memfd() flag to initialize it as mappable Fuad Tabba
2025-01-17 16:29 ` [RFC PATCH v5 11/15] KVM: guest_memfd: selftests: guest_memfd mmap() test when mapping is allowed Fuad Tabba
2025-01-17 16:29 ` [RFC PATCH v5 12/15] KVM: arm64: Skip VMA checks for slots without userspace address Fuad Tabba
2025-01-17 16:29 ` [RFC PATCH v5 13/15] KVM: arm64: Refactor user_mem_abort() calculation of force_pte Fuad Tabba
2025-01-17 16:30 ` [RFC PATCH v5 14/15] KVM: arm64: Handle guest_memfd()-backed guest page faults Fuad Tabba
2025-01-17 16:30 ` [RFC PATCH v5 15/15] KVM: arm64: Enable guest_memfd private memory when pKVM is enabled Fuad Tabba
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=diqz1pwbspzx.fsf@ackerleytng-ctop.c.googlers.com \
--to=ackerleytng@google.com \
--cc=akpm@linux-foundation.org \
--cc=amoorthy@google.com \
--cc=anup@brainfault.org \
--cc=aou@eecs.berkeley.edu \
--cc=brauner@kernel.org \
--cc=catalin.marinas@arm.com \
--cc=chao.p.peng@linux.intel.com \
--cc=chenhuacai@kernel.org \
--cc=david@redhat.com \
--cc=dmatlack@google.com \
--cc=fvdl@google.com \
--cc=hch@infradead.org \
--cc=hughd@google.com \
--cc=isaku.yamahata@gmail.com \
--cc=isaku.yamahata@intel.com \
--cc=james.morse@arm.com \
--cc=jarkko@kernel.org \
--cc=jgg@nvidia.com \
--cc=jhubbard@nvidia.com \
--cc=jthoughton@google.com \
--cc=keirf@google.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=kvm@vger.kernel.org \
--cc=liam.merwick@oracle.com \
--cc=linux-arm-msm@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mail@maciej.szmigiero.name \
--cc=maz@kernel.org \
--cc=mic@digikod.net \
--cc=michael.roth@amd.com \
--cc=mpe@ellerman.id.au \
--cc=oliver.upton@linux.dev \
--cc=palmer@dabbelt.com \
--cc=paul.walmsley@sifive.com \
--cc=pbonzini@redhat.com \
--cc=qperret@google.com \
--cc=quic_cvanscha@quicinc.com \
--cc=quic_eberman@quicinc.com \
--cc=quic_mnalajal@quicinc.com \
--cc=quic_pderrin@quicinc.com \
--cc=quic_pheragu@quicinc.com \
--cc=quic_svaddagi@quicinc.com \
--cc=quic_tsoni@quicinc.com \
--cc=rientjes@google.com \
--cc=roypat@amazon.co.uk \
--cc=seanjc@google.com \
--cc=shuah@kernel.org \
--cc=steven.price@arm.com \
--cc=suzuki.poulose@arm.com \
--cc=tabba@google.com \
--cc=vannapurve@google.com \
--cc=vbabka@suse.cz \
--cc=viro@zeniv.linux.org.uk \
--cc=wei.w.wang@intel.com \
--cc=will@kernel.org \
--cc=willy@infradead.org \
--cc=xiaoyao.li@intel.com \
--cc=yilun.xu@intel.com \
--cc=yu.c.zhang@linux.intel.com \
--cc=yuzenghui@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox