From: Patrick Roy <roypat@amazon.co.uk>
To: <tabba@google.com>, <quic_eberman@quicinc.com>,
<david@redhat.com>, <seanjc@google.com>, <pbonzini@redhat.com>,
<jthoughton@google.com>, <ackerleytng@google.com>,
<vannapurve@google.com>, <rppt@kernel.org>
Cc: Patrick Roy <roypat@amazon.co.uk>, <graf@amazon.com>,
<jgowans@amazon.com>, <derekmn@amazon.com>, <kalyazin@amazon.com>,
<xmarcalx@amazon.com>, <linux-mm@kvack.org>, <corbet@lwn.net>,
<catalin.marinas@arm.com>, <will@kernel.org>,
<chenhuacai@kernel.org>, <kernel@xen0n.name>,
<paul.walmsley@sifive.com>, <palmer@dabbelt.com>,
<aou@eecs.berkeley.edu>, <hca@linux.ibm.com>, <gor@linux.ibm.com>,
<agordeev@linux.ibm.com>, <borntraeger@linux.ibm.com>,
<svens@linux.ibm.com>, <gerald.schaefer@linux.ibm.com>,
<tglx@linutronix.de>, <mingo@redhat.com>, <bp@alien8.de>,
<dave.hansen@linux.intel.com>, <x86@kernel.org>, <hpa@zytor.com>,
<luto@kernel.org>, <peterz@infradead.org>, <rostedt@goodmis.org>,
<mhiramat@kernel.org>, <mathieu.desnoyers@efficios.com>,
<shuah@kernel.org>, <kvm@vger.kernel.org>,
<linux-doc@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
<linux-arm-kernel@lists.infradead.org>,
<loongarch@lists.linux.dev>, <linux-riscv@lists.infradead.org>,
<linux-s390@vger.kernel.org>,
<linux-trace-kernel@vger.kernel.org>,
<linux-kselftest@vger.kernel.org>
Subject: [RFC PATCH v3 3/6] kvm: gmem: implement direct map manipulation routines
Date: Wed, 30 Oct 2024 13:49:07 +0000 [thread overview]
Message-ID: <20241030134912.515725-4-roypat@amazon.co.uk> (raw)
In-Reply-To: <20241030134912.515725-1-roypat@amazon.co.uk>
Implement (yet unused) routines for manipulating guest_memfd direct map
state. This is largely for illustration purposes.
kvm_gmem_set_direct_map allows manipulating arbitrary pgoff_t
ranges, even if the covered memory has not yet been faulted in (in which
case the requested direct map state is recorded in the xarray and will
be applied by kvm_gmem_folio_configure_direct_map after the folio is
faulted in and prepared/populated). This can be used to realize
private/shared conversions on not-yet-faulted in memory, as discussed in
the guest_memfd upstream call [1].
kvm_gmem_folio_set_direct_map allows manipulating the direct map entries
for a gmem folio that the caller already holds a reference for (whereas
kvm_gmem_set_direct_map needs to look up all folios intersecting the
given pgoff range in the filemap first).
The xa lock serializes calls to kvm_gmem_folio_set_direct_map and
kvm_gmem_set_direct_map, while the read side
(kvm_gmem_folio_configure_direct_map) is protected by RCU. This is
sufficient to ensure consistency between the xarray and the folio's
actual direct map state, as kvm_gmem_folio_configure_direct_map is
called only for freshly allocated folios, and before the folio lock is
dropped for the first time, meaning kvm_gmem_folio_configure_direct_map
always does it's set_direct_map calls before either of
kvm_gmem_[folio_]set_direct_map get a chance. Even if a concurrent call
to kvm_gmem_[folio_]set_direct_map happens, this ensures a sort of
"eventual consistency" between xarray and actual direct map
configuration by the time kvm_gmem_[folio_]set_direct_map exits.
[1]: https://lore.kernel.org/kvm/4b49248b-1cf1-44dc-9b50-ee551e1671ac@redhat.com/
Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
---
virt/kvm/guest_memfd.c | 125 +++++++++++++++++++++++++++++++++++++++++
1 file changed, 125 insertions(+)
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index 50ffc2ad73eda..54387828dcc6a 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -96,6 +96,131 @@ static int kvm_gmem_folio_configure_direct_map(struct folio *folio)
return r;
}
+/*
+ * Updates the range [@start, @end] in @gmem_priv's direct map state xarray to be @state,
+ * e.g. erasing entries in this range if @state is the default state, and creating
+ * entries otherwise.
+ *
+ * Assumes the xa_lock is held.
+ */
+static int __kvm_gmem_update_xarray(struct kvm_gmem_inode_private *gmem_priv, pgoff_t start,
+ pgoff_t end, bool state)
+{
+ struct xarray *xa = &gmem_priv->direct_map_state;
+ int r = 0;
+
+ /*
+ * Cannot use xa_store_range, as multi-indexes cannot easily
+ * be partially updated.
+ */
+ for (pgoff_t index = start; index < end; ++index) {
+ if (state == gmem_priv->default_direct_map_state)
+ __xa_erase(xa, index);
+ else
+ /* don't care _what_ we store in the xarray, only care about presence */
+ __xa_store(xa, index, gmem_priv, GFP_KERNEL);
+
+ r = xa_err(xa);
+ if (r)
+ goto out;
+ }
+
+out:
+ return r;
+}
+
+static int __kvm_gmem_folio_set_direct_map(struct folio *folio, pgoff_t start, pgoff_t end,
+ bool state)
+{
+ unsigned long npages = end - start + 1;
+ struct page *first_page = folio_file_page(folio, start);
+
+ int r = set_direct_map_valid_noflush(first_page, npages, state);
+
+ flush_tlb_kernel_range((unsigned long)page_address(first_page),
+ (unsigned long)page_address(first_page) +
+ npages * PAGE_SIZE);
+ return r;
+}
+
+/*
+ * Updates the direct map status for the given range from @start to @end (inclusive), returning
+ * -EINVAL if this range is not completely contained within @folio. Also updates the
+ * xarray stored in the private data of the inode @folio is attached to.
+ *
+ * Takes and drops the folio lock.
+ */
+static __always_unused int kvm_gmem_folio_set_direct_map(struct folio *folio, pgoff_t start,
+ pgoff_t end, bool state)
+{
+ struct inode *inode = folio_inode(folio);
+ struct kvm_gmem_inode_private *gmem_priv = inode->i_private;
+ int r = -EINVAL;
+
+ if (!folio_contains(folio, start) || !folio_contains(folio, end))
+ goto out;
+
+ xa_lock(&gmem_priv->direct_map_state);
+ r = __kvm_gmem_update_xarray(gmem_priv, start, end, state);
+ if (r)
+ goto unlock_xa;
+
+ folio_lock(folio);
+ r = __kvm_gmem_folio_set_direct_map(folio, start, end, state);
+ folio_unlock(folio);
+
+unlock_xa:
+ xa_unlock(&gmem_priv->direct_map_state);
+out:
+ return r;
+}
+
+/*
+ * Updates the direct map status for the given range from @start to @end (inclusive)
+ * of @inode. Folios in this range have their direct map entries reconfigured,
+ * and the xarray in the @inode's private data is updated.
+ */
+static __always_unused int kvm_gmem_set_direct_map(struct inode *inode, pgoff_t start,
+ pgoff_t end, bool state)
+{
+ struct kvm_gmem_inode_private *gmem_priv = inode->i_private;
+ struct folio_batch fbatch;
+ pgoff_t index = start;
+ unsigned int count, i;
+ int r = 0;
+
+ xa_lock(&gmem_priv->direct_map_state);
+
+ r = __kvm_gmem_update_xarray(gmem_priv, start, end, state);
+ if (r)
+ goto out;
+
+ folio_batch_init(&fbatch);
+ while (!filemap_get_folios(inode->i_mapping, &index, end, &fbatch) && !r) {
+ count = folio_batch_count(&fbatch);
+ for (i = 0; i < count; i++) {
+ struct folio *folio = fbatch.folios[i];
+ pgoff_t folio_start = max(folio_index(folio), start);
+ pgoff_t folio_end =
+ min(folio_index(folio) + folio_nr_pages(folio),
+ end);
+
+ folio_lock(folio);
+ r = __kvm_gmem_folio_set_direct_map(folio, folio_start,
+ folio_end, state);
+ folio_unlock(folio);
+
+ if (r)
+ break;
+ }
+ folio_batch_release(&fbatch);
+ }
+
+ xa_unlock(&gmem_priv->direct_map_state);
+out:
+ return r;
+}
+
/**
* folio_file_pfn - like folio_file_page, but return a pfn.
* @folio: The folio which contains this index.
--
2.47.0
next prev parent reply other threads:[~2024-10-30 13:50 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-30 13:49 [RFC PATCH v3 0/6] Direct Map Removal for guest_memfd Patrick Roy
2024-10-30 13:49 ` [RFC PATCH v3 1/6] arch: introduce set_direct_map_valid_noflush() Patrick Roy
2024-10-31 9:57 ` David Hildenbrand
2024-11-11 12:12 ` Vlastimil Babka
2024-11-12 14:48 ` Patrick Roy
2024-10-30 13:49 ` [RFC PATCH v3 2/6] kvm: gmem: add flag to remove memory from kernel direct map Patrick Roy
2024-10-31 13:56 ` Mike Day
2024-10-30 13:49 ` Patrick Roy [this message]
2024-10-31 14:19 ` [RFC PATCH v3 3/6] kvm: gmem: implement direct map manipulation routines Mike Day
2024-10-30 13:49 ` [RFC PATCH v3 4/6] kvm: gmem: add trace point for direct map state changes Patrick Roy
2024-10-30 13:49 ` [RFC PATCH v3 5/6] kvm: document KVM_GMEM_NO_DIRECT_MAP flag Patrick Roy
2024-10-30 13:49 ` [RFC PATCH v3 6/6] kvm: selftests: run gmem tests with KVM_GMEM_NO_DIRECT_MAP set Patrick Roy
2024-10-31 9:50 ` [RFC PATCH v3 0/6] Direct Map Removal for guest_memfd David Hildenbrand
2024-10-31 10:42 ` Patrick Roy
2024-11-01 0:10 ` Manwaring, Derek
2024-11-01 15:18 ` Sean Christopherson
2024-11-01 18:32 ` Kaplan, David
2024-11-01 16:06 ` Dave Hansen
2024-11-01 16:56 ` Manwaring, Derek
2024-11-01 17:20 ` Dave Hansen
2024-11-01 18:31 ` Manwaring, Derek
2024-11-01 18:43 ` Dave Hansen
2024-11-01 19:29 ` Manwaring, Derek
2024-11-01 19:39 ` Dave Hansen
2024-11-04 8:33 ` Reshetova, Elena
2024-11-06 17:04 ` Manwaring, Derek
2024-11-08 10:36 ` Reshetova, Elena
2024-11-13 3:31 ` Manwaring, Derek
2024-11-04 12:18 ` David Hildenbrand
2024-11-04 13:09 ` Patrick Roy
2024-11-04 21:30 ` David Hildenbrand
2024-11-12 14:40 ` Patrick Roy
2024-11-12 14:52 ` David Hildenbrand
2024-11-15 16:59 ` Patrick Roy
2024-11-15 17:10 ` David Hildenbrand
2024-11-15 17:23 ` Patrick Roy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241030134912.515725-4-roypat@amazon.co.uk \
--to=roypat@amazon.co.uk \
--cc=ackerleytng@google.com \
--cc=agordeev@linux.ibm.com \
--cc=aou@eecs.berkeley.edu \
--cc=borntraeger@linux.ibm.com \
--cc=bp@alien8.de \
--cc=catalin.marinas@arm.com \
--cc=chenhuacai@kernel.org \
--cc=corbet@lwn.net \
--cc=dave.hansen@linux.intel.com \
--cc=david@redhat.com \
--cc=derekmn@amazon.com \
--cc=gerald.schaefer@linux.ibm.com \
--cc=gor@linux.ibm.com \
--cc=graf@amazon.com \
--cc=hca@linux.ibm.com \
--cc=hpa@zytor.com \
--cc=jgowans@amazon.com \
--cc=jthoughton@google.com \
--cc=kalyazin@amazon.com \
--cc=kernel@xen0n.name \
--cc=kvm@vger.kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-riscv@lists.infradead.org \
--cc=linux-s390@vger.kernel.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=loongarch@lists.linux.dev \
--cc=luto@kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=mhiramat@kernel.org \
--cc=mingo@redhat.com \
--cc=palmer@dabbelt.com \
--cc=paul.walmsley@sifive.com \
--cc=pbonzini@redhat.com \
--cc=peterz@infradead.org \
--cc=quic_eberman@quicinc.com \
--cc=rostedt@goodmis.org \
--cc=rppt@kernel.org \
--cc=seanjc@google.com \
--cc=shuah@kernel.org \
--cc=svens@linux.ibm.com \
--cc=tabba@google.com \
--cc=tglx@linutronix.de \
--cc=vannapurve@google.com \
--cc=will@kernel.org \
--cc=x86@kernel.org \
--cc=xmarcalx@amazon.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox