From: Mike Rapoport <rppt@kernel.org>
To: Patrick Roy <roypat@amazon.co.uk>
Cc: seanjc@google.com, pbonzini@redhat.com,
akpm@linux-foundation.org, dwmw@amazon.co.uk, david@redhat.com,
tglx@linutronix.de, mingo@redhat.com, bp@alien8.de,
dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com,
willy@infradead.org, graf@amazon.com, derekmn@amazon.com,
kalyazin@amazon.com, kvm@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
dmatlack@google.com, tabba@google.com,
chao.p.peng@linux.intel.com, xmarcalx@amazon.co.uk
Subject: Re: [RFC PATCH 5/8] kvm: gmem: add option to remove guest private memory from direct map
Date: Wed, 10 Jul 2024 10:31:35 +0300 [thread overview]
Message-ID: <Zo441yz7Yw2JZcPs@kernel.org> (raw)
In-Reply-To: <20240709132041.3625501-6-roypat@amazon.co.uk>
On Tue, Jul 09, 2024 at 02:20:33PM +0100, Patrick Roy wrote:
> While guest_memfd is not available to be mapped by userspace, it is
> still accessible through the kernel's direct map. This means that in
> scenarios where guest-private memory is not hardware protected, it can
> be speculatively read and its contents potentially leaked through
> hardware side-channels. Removing guest-private memory from the direct
> map, thus mitigates a large class of speculative execution issues
> [1, Table 1].
>
> This patch adds a flag to the `KVM_CREATE_GUEST_MEMFD` which, if set, removes the
> struct pages backing guest-private memory from the direct map. Should
> `CONFIG_HAVE_KVM_GMEM_{INVALIDATE, PREPARE}` be set, pages are removed
> after preparation and before invalidation, so that the
> prepare/invalidate routines do not have to worry about potentially
> absent direct map entries.
>
> Direct map removal do not reuse the `KVM_GMEM_PREPARE` machinery, since `prepare` can be
> called multiple time, and it is the responsibility of the preparation
> routine to not "prepare" the same folio twice [2]. Thus, instead
> explicitly check if `filemap_grab_folio` allocated a new folio, and
> remove the returned folio from the direct map only if this was the case.
>
> The patch uses release_folio instead of free_folio to reinsert pages
> back into the direct map as by the time free_folio is called,
> folio->mapping can already be NULL. This means that a call to
> folio_inode inside free_folio might deference a NULL pointer, leaving no
> way to access the inode which stores the flags that allow determining
> whether the page was removed from the direct map in the first place.
>
> Lastly, the patch uses set_direct_map_{invalid,default}_noflush instead
> of `set_memory_[n]p` to avoid expensive flushes of TLBs and the L*-cache
> hierarchy. This is especially important once KVM restores direct map
> entries on-demand in later patches, where simple FIO benchmarks of a
> virtio-blk device have shown that TLB flushes on a Intel(R) Xeon(R)
> Platinum 8375C CPU @ 2.90GHz resulted in 80% degradation in throughput
> compared to a non-flushing solution.
>
> Not flushing the TLB means that until TLB entries for temporarily
> restored direct map entries get naturally evicted, they can be used
> during speculative execution, and effectively "unhide" the memory for
> longer than intended. We consider this acceptable, as the only pages
> that are temporarily reinserted into the direct map like this will
> either hold PV data structures (kvm-clock, asyncpf, etc), or pages
> containing privileged instructions inside the guest kernel image (in the
> MMIO emulation case).
>
> [1]: https://download.vusec.net/papers/quarantine_raid23.pdf
>
> Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
> ---
> include/uapi/linux/kvm.h | 2 ++
> virt/kvm/guest_memfd.c | 52 ++++++++++++++++++++++++++++++++++------
> 2 files changed, 47 insertions(+), 7 deletions(-)
>
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index e065d9fe7ab2..409116aa23c9 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1563,4 +1563,6 @@ struct kvm_create_guest_memfd {
> __u64 reserved[6];
> };
>
> +#define KVM_GMEM_NO_DIRECT_MAP (1ULL << 0)
> +
> #endif /* __LINUX_KVM_H */
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index 9148b9679bb1..dc9b0c2d0b0e 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -4,6 +4,7 @@
> #include <linux/kvm_host.h>
> #include <linux/pagemap.h>
> #include <linux/anon_inodes.h>
> +#include <linux/set_memory.h>
>
> #include "kvm_mm.h"
>
> @@ -49,9 +50,16 @@ static int kvm_gmem_prepare_folio(struct inode *inode, pgoff_t index, struct fol
> return 0;
> }
>
> +static bool kvm_gmem_not_present(struct inode *inode)
> +{
> + return ((unsigned long)inode->i_private & KVM_GMEM_NO_DIRECT_MAP) != 0;
> +}
> +
> static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index, bool prepare)
> {
> struct folio *folio;
> + bool zap_direct_map = false;
> + int r;
>
> /* TODO: Support huge pages. */
> folio = filemap_grab_folio(inode->i_mapping, index);
> @@ -74,16 +82,30 @@ static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index, bool
> for (i = 0; i < nr_pages; i++)
> clear_highpage(folio_page(folio, i));
>
> + // We need to clear the folio before calling kvm_gmem_prepare_folio,
> + // but can only remove it from the direct map _after_ preparation is done.
No C++ comments please
> + zap_direct_map = kvm_gmem_not_present(inode);
> +
> folio_mark_uptodate(folio);
> }
>
> if (prepare) {
> - int r = kvm_gmem_prepare_folio(inode, index, folio);
> - if (r < 0) {
> - folio_unlock(folio);
> - folio_put(folio);
> - return ERR_PTR(r);
> - }
> + r = kvm_gmem_prepare_folio(inode, index, folio);
> + if (r < 0)
> + goto out_err;
> + }
> +
> + if (zap_direct_map) {
> + r = set_direct_map_invalid_noflush(&folio->page);
It's not future proof to presume that folio is a single page here.
You should loop over folio pages and add a TLB flush after the loop.
> + if (r < 0)
> + goto out_err;
> +
> + // We use the private flag to track whether the folio has been removed
> + // from the direct map. This is because inside of ->free_folio,
> + // we do not have access to the address_space anymore, meaning we
> + // cannot check folio_inode(folio)->i_private to determine whether
> + // KVM_GMEM_NO_DIRECT_MAP was set.
> + folio_set_private(folio);
> }
>
> /*
> @@ -91,6 +113,10 @@ static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index, bool
> * unevictable and there is no storage to write back to.
> */
> return folio;
> +out_err:
> + folio_unlock(folio);
> + folio_put(folio);
> + return ERR_PTR(r);
> }
>
> static void kvm_gmem_invalidate_begin(struct kvm_gmem *gmem, pgoff_t start,
> @@ -354,10 +380,22 @@ static void kvm_gmem_free_folio(struct folio *folio)
> }
> #endif
>
> +static void kvm_gmem_invalidate_folio(struct folio *folio, size_t start, size_t end)
> +{
> + if (start == 0 && end == PAGE_SIZE) {
> + // We only get here if PG_private is set, which only happens if kvm_gmem_not_present
> + // returned true in kvm_gmem_get_folio. Thus no need to do that check again.
> + BUG_ON(set_direct_map_default_noflush(&folio->page));
Ditto.
> +
> + folio_clear_private(folio);
> + }
> +}
> +
> static const struct address_space_operations kvm_gmem_aops = {
> .dirty_folio = noop_dirty_folio,
> .migrate_folio = kvm_gmem_migrate_folio,
> .error_remove_folio = kvm_gmem_error_folio,
> + .invalidate_folio = kvm_gmem_invalidate_folio,
> #ifdef CONFIG_HAVE_KVM_GMEM_INVALIDATE
> .free_folio = kvm_gmem_free_folio,
> #endif
> @@ -443,7 +481,7 @@ int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args)
> {
> loff_t size = args->size;
> u64 flags = args->flags;
> - u64 valid_flags = 0;
> + u64 valid_flags = KVM_GMEM_NO_DIRECT_MAP;
>
> if (flags & ~valid_flags)
> return -EINVAL;
> --
> 2.45.2
>
--
Sincerely yours,
Mike.
next prev parent reply other threads:[~2024-07-10 7:34 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-09 13:20 [RFC PATCH 0/8] Unmapping guest_memfd from Direct Map Patrick Roy
2024-07-09 13:20 ` [RFC PATCH 1/8] kvm: Allow reading/writing gmem using kvm_{read,write}_guest Patrick Roy
2024-07-09 13:20 ` [RFC PATCH 2/8] kvm: use slowpath in gfn_to_hva_cache if memory is private Patrick Roy
2024-07-09 13:20 ` [RFC PATCH 3/8] kvm: pfncache: enlighten about gmem Patrick Roy
2024-07-09 14:36 ` David Woodhouse
2024-07-10 9:49 ` Patrick Roy
2024-07-10 10:20 ` David Woodhouse
2024-07-10 10:46 ` Patrick Roy
2024-07-10 10:50 ` David Woodhouse
2024-07-09 13:20 ` [RFC PATCH 4/8] kvm: x86: support walking guest page tables in gmem Patrick Roy
2024-07-09 13:20 ` [RFC PATCH 5/8] kvm: gmem: add option to remove guest private memory from direct map Patrick Roy
2024-07-10 7:31 ` Mike Rapoport [this message]
2024-07-10 9:50 ` Patrick Roy
2024-07-09 13:20 ` [RFC PATCH 6/8] kvm: gmem: Temporarily restore direct map entries when needed Patrick Roy
2024-07-11 6:25 ` Paolo Bonzini
2024-07-09 13:20 ` [RFC PATCH 7/8] mm: secretmem: use AS_INACCESSIBLE to prohibit GUP Patrick Roy
2024-07-09 21:09 ` David Hildenbrand
2024-07-10 7:32 ` Mike Rapoport
2024-07-10 9:50 ` Patrick Roy
2024-07-10 21:14 ` David Hildenbrand
2024-07-09 13:20 ` [RFC PATCH 8/8] kvm: gmem: Allow restricted userspace mappings Patrick Roy
2024-07-09 14:48 ` Fuad Tabba
2024-07-09 21:13 ` David Hildenbrand
2024-07-10 9:51 ` Patrick Roy
2024-07-10 21:12 ` David Hildenbrand
2024-07-10 21:53 ` Sean Christopherson
2024-07-10 21:56 ` David Hildenbrand
2024-07-12 15:59 ` Patrick Roy
2024-07-30 10:15 ` David Hildenbrand
2024-08-01 10:30 ` Patrick Roy
2024-07-22 12:28 ` [RFC PATCH 0/8] Unmapping guest_memfd from Direct Map Vlastimil Babka (SUSE)
2024-07-26 6:55 ` Patrick Roy
2024-07-30 10:17 ` David Hildenbrand
2024-07-26 16:44 ` Yosry Ahmed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Zo441yz7Yw2JZcPs@kernel.org \
--to=rppt@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=bp@alien8.de \
--cc=chao.p.peng@linux.intel.com \
--cc=dave.hansen@linux.intel.com \
--cc=david@redhat.com \
--cc=derekmn@amazon.com \
--cc=dmatlack@google.com \
--cc=dwmw@amazon.co.uk \
--cc=graf@amazon.com \
--cc=hpa@zytor.com \
--cc=kalyazin@amazon.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mingo@redhat.com \
--cc=pbonzini@redhat.com \
--cc=roypat@amazon.co.uk \
--cc=seanjc@google.com \
--cc=tabba@google.com \
--cc=tglx@linutronix.de \
--cc=willy@infradead.org \
--cc=x86@kernel.org \
--cc=xmarcalx@amazon.co.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox