From: Mike Day <michael.day@amd.com>
To: Patrick Roy <roypat@amazon.co.uk>,
tabba@google.com, quic_eberman@quicinc.com, david@redhat.com,
seanjc@google.com, pbonzini@redhat.com, jthoughton@google.com,
ackerleytng@google.com, vannapurve@google.com, rppt@kernel.org
Cc: graf@amazon.com, jgowans@amazon.com, derekmn@amazon.com,
kalyazin@amazon.com, xmarcalx@amazon.com, linux-mm@kvack.org,
corbet@lwn.net, catalin.marinas@arm.com, will@kernel.org,
chenhuacai@kernel.org, kernel@xen0n.name,
paul.walmsley@sifive.com, palmer@dabbelt.com,
aou@eecs.berkeley.edu, hca@linux.ibm.com, gor@linux.ibm.com,
agordeev@linux.ibm.com, borntraeger@linux.ibm.com,
svens@linux.ibm.com, gerald.schaefer@linux.ibm.com,
tglx@linutronix.de, mingo@redhat.com, bp@alien8.de,
dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com,
luto@kernel.org, peterz@infradead.org, rostedt@goodmis.org,
mhiramat@kernel.org, mathieu.desnoyers@efficios.com,
shuah@kernel.org, kvm@vger.kernel.org, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, loongarch@lists.linux.dev,
linux-riscv@lists.infradead.org, linux-s390@vger.kernel.org,
linux-trace-kernel@vger.kernel.org,
linux-kselftest@vger.kernel.org
Subject: Re: [RFC PATCH v3 3/6] kvm: gmem: implement direct map manipulation routines
Date: Thu, 31 Oct 2024 09:19:41 -0500 [thread overview]
Message-ID: <80d700e8-5800-4128-b9fd-6bd37525facd@amd.com> (raw)
In-Reply-To: <20241030134912.515725-4-roypat@amazon.co.uk>
On 10/30/24 08:49, Patrick Roy wrote:
> Implement (yet unused) routines for manipulating guest_memfd direct map
> state. This is largely for illustration purposes.
>
> kvm_gmem_set_direct_map allows manipulating arbitrary pgoff_t
> ranges, even if the covered memory has not yet been faulted in (in which
> case the requested direct map state is recorded in the xarray and will
> be applied by kvm_gmem_folio_configure_direct_map after the folio is
> faulted in and prepared/populated). This can be used to realize
> private/shared conversions on not-yet-faulted in memory, as discussed in
> the guest_memfd upstream call [1].
>
> kvm_gmem_folio_set_direct_map allows manipulating the direct map entries
> for a gmem folio that the caller already holds a reference for (whereas
> kvm_gmem_set_direct_map needs to look up all folios intersecting the
> given pgoff range in the filemap first).
>
> The xa lock serializes calls to kvm_gmem_folio_set_direct_map and
> kvm_gmem_set_direct_map, while the read side
> (kvm_gmem_folio_configure_direct_map) is protected by RCU. This is
> sufficient to ensure consistency between the xarray and the folio's
> actual direct map state, as kvm_gmem_folio_configure_direct_map is
> called only for freshly allocated folios, and before the folio lock is
> dropped for the first time, meaning kvm_gmem_folio_configure_direct_map
> always does it's set_direct_map calls before either of
> kvm_gmem_[folio_]set_direct_map get a chance. Even if a concurrent call
> to kvm_gmem_[folio_]set_direct_map happens, this ensures a sort of
> "eventual consistency" between xarray and actual direct map
> configuration by the time kvm_gmem_[folio_]set_direct_map exits.
>
> [1]: https://lore.kernel.org/kvm/4b49248b-1cf1-44dc-9b50-ee551e1671ac@redhat.com/
>
> Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
> ---
> virt/kvm/guest_memfd.c | 125 +++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 125 insertions(+)
>
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index 50ffc2ad73eda..54387828dcc6a 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -96,6 +96,131 @@ static int kvm_gmem_folio_configure_direct_map(struct folio *folio)
> return r;
> }
>
> +/*
> + * Updates the range [@start, @end] in @gmem_priv's direct map state xarray to be @state,
> + * e.g. erasing entries in this range if @state is the default state, and creating
> + * entries otherwise.
> + *
> + * Assumes the xa_lock is held.
> + */
> +static int __kvm_gmem_update_xarray(struct kvm_gmem_inode_private *gmem_priv, pgoff_t start,
> + pgoff_t end, bool state)
> +{
> + struct xarray *xa = &gmem_priv->direct_map_state;
> + int r = 0;
> +
> + /*
> + * Cannot use xa_store_range, as multi-indexes cannot easily
> + * be partially updated.
> + */
> + for (pgoff_t index = start; index < end; ++index) {
> + if (state == gmem_priv->default_direct_map_state)
> + __xa_erase(xa, index);
> + else
> + /* don't care _what_ we store in the xarray, only care about presence */
> + __xa_store(xa, index, gmem_priv, GFP_KERNEL);
> +
> + r = xa_err(xa);
> + if (r)
> + goto out;
> + }
> +
> +out:
> + return r;
> +}
> +
> +static int __kvm_gmem_folio_set_direct_map(struct folio *folio, pgoff_t start, pgoff_t end,
> + bool state)
> +{
> + unsigned long npages = end - start + 1;
> + struct page *first_page = folio_file_page(folio, start);
> +
> + int r = set_direct_map_valid_noflush(first_page, npages, state);
> +
> + flush_tlb_kernel_range((unsigned long)page_address(first_page),
> + (unsigned long)page_address(first_page) +
> + npages * PAGE_SIZE);
> + return r;
> +}
> +
> +/*
> + * Updates the direct map status for the given range from @start to @end (inclusive), returning
> + * -EINVAL if this range is not completely contained within @folio. Also updates the
> + * xarray stored in the private data of the inode @folio is attached to.
> + *
> + * Takes and drops the folio lock.
> + */
> +static __always_unused int kvm_gmem_folio_set_direct_map(struct folio *folio, pgoff_t start,
> + pgoff_t end, bool state)
> +{
> + struct inode *inode = folio_inode(folio);
> + struct kvm_gmem_inode_private *gmem_priv = inode->i_private;
> + int r = -EINVAL;
> +
> + if (!folio_contains(folio, start) || !folio_contains(folio, end))
> + goto out;
> +
> + xa_lock(&gmem_priv->direct_map_state);
> + r = __kvm_gmem_update_xarray(gmem_priv, start, end, state);
> + if (r)
> + goto unlock_xa;
> +
> + folio_lock(folio);
> + r = __kvm_gmem_folio_set_direct_map(folio, start, end, state);
> + folio_unlock(folio);
> +
> +unlock_xa:
> + xa_unlock(&gmem_priv->direct_map_state);
> +out:
> + return r;
> +}
> +
> +/*
> + * Updates the direct map status for the given range from @start to @end (inclusive)
> + * of @inode. Folios in this range have their direct map entries reconfigured,
> + * and the xarray in the @inode's private data is updated.
> + */
> +static __always_unused int kvm_gmem_set_direct_map(struct inode *inode, pgoff_t start,
> + pgoff_t end, bool state)
> +{
> + struct kvm_gmem_inode_private *gmem_priv = inode->i_private;
> + struct folio_batch fbatch;
> + pgoff_t index = start;
> + unsigned int count, i;
> + int r = 0;
> +
> + xa_lock(&gmem_priv->direct_map_state);
> +
> + r = __kvm_gmem_update_xarray(gmem_priv, start, end, state);
> + if (r)
> + goto out;
> +
if (r) {
xa_unlock(&gmem_priv->direct_map_state);
goto out;
}
thanks,
Mike
> + folio_batch_init(&fbatch);
> + while (!filemap_get_folios(inode->i_mapping, &index, end, &fbatch) && !r) {
> + count = folio_batch_count(&fbatch);
> + for (i = 0; i < count; i++) {
> + struct folio *folio = fbatch.folios[i];
> + pgoff_t folio_start = max(folio_index(folio), start);
> + pgoff_t folio_end =
> + min(folio_index(folio) + folio_nr_pages(folio),
> + end);
> +
> + folio_lock(folio);
> + r = __kvm_gmem_folio_set_direct_map(folio, folio_start,
> + folio_end, state);
> + folio_unlock(folio);
> +
> + if (r)
> + break;
> + }
> + folio_batch_release(&fbatch);
> + }
> +
> + xa_unlock(&gmem_priv->direct_map_state);
> +out:
> + return r;
> +}
> +
> /**
> * folio_file_pfn - like folio_file_page, but return a pfn.
> * @folio: The folio which contains this index.
next prev parent reply other threads:[~2024-10-31 14:20 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-30 13:49 [RFC PATCH v3 0/6] Direct Map Removal for guest_memfd Patrick Roy
2024-10-30 13:49 ` [RFC PATCH v3 1/6] arch: introduce set_direct_map_valid_noflush() Patrick Roy
2024-10-31 9:57 ` David Hildenbrand
2024-11-11 12:12 ` Vlastimil Babka
2024-11-12 14:48 ` Patrick Roy
2024-10-30 13:49 ` [RFC PATCH v3 2/6] kvm: gmem: add flag to remove memory from kernel direct map Patrick Roy
2024-10-31 13:56 ` Mike Day
2024-10-30 13:49 ` [RFC PATCH v3 3/6] kvm: gmem: implement direct map manipulation routines Patrick Roy
2024-10-31 14:19 ` Mike Day [this message]
2024-10-30 13:49 ` [RFC PATCH v3 4/6] kvm: gmem: add trace point for direct map state changes Patrick Roy
2024-10-30 13:49 ` [RFC PATCH v3 5/6] kvm: document KVM_GMEM_NO_DIRECT_MAP flag Patrick Roy
2024-10-30 13:49 ` [RFC PATCH v3 6/6] kvm: selftests: run gmem tests with KVM_GMEM_NO_DIRECT_MAP set Patrick Roy
2024-10-31 9:50 ` [RFC PATCH v3 0/6] Direct Map Removal for guest_memfd David Hildenbrand
2024-10-31 10:42 ` Patrick Roy
2024-11-01 0:10 ` Manwaring, Derek
2024-11-01 15:18 ` Sean Christopherson
2024-11-01 18:32 ` Kaplan, David
2024-11-01 16:06 ` Dave Hansen
2024-11-01 16:56 ` Manwaring, Derek
2024-11-01 17:20 ` Dave Hansen
2024-11-01 18:31 ` Manwaring, Derek
2024-11-01 18:43 ` Dave Hansen
2024-11-01 19:29 ` Manwaring, Derek
2024-11-01 19:39 ` Dave Hansen
2024-11-04 8:33 ` Reshetova, Elena
2024-11-06 17:04 ` Manwaring, Derek
2024-11-08 10:36 ` Reshetova, Elena
2024-11-13 3:31 ` Manwaring, Derek
2024-11-04 12:18 ` David Hildenbrand
2024-11-04 13:09 ` Patrick Roy
2024-11-04 21:30 ` David Hildenbrand
2024-11-12 14:40 ` Patrick Roy
2024-11-12 14:52 ` David Hildenbrand
2024-11-15 16:59 ` Patrick Roy
2024-11-15 17:10 ` David Hildenbrand
2024-11-15 17:23 ` Patrick Roy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=80d700e8-5800-4128-b9fd-6bd37525facd@amd.com \
--to=michael.day@amd.com \
--cc=ackerleytng@google.com \
--cc=agordeev@linux.ibm.com \
--cc=aou@eecs.berkeley.edu \
--cc=borntraeger@linux.ibm.com \
--cc=bp@alien8.de \
--cc=catalin.marinas@arm.com \
--cc=chenhuacai@kernel.org \
--cc=corbet@lwn.net \
--cc=dave.hansen@linux.intel.com \
--cc=david@redhat.com \
--cc=derekmn@amazon.com \
--cc=gerald.schaefer@linux.ibm.com \
--cc=gor@linux.ibm.com \
--cc=graf@amazon.com \
--cc=hca@linux.ibm.com \
--cc=hpa@zytor.com \
--cc=jgowans@amazon.com \
--cc=jthoughton@google.com \
--cc=kalyazin@amazon.com \
--cc=kernel@xen0n.name \
--cc=kvm@vger.kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-riscv@lists.infradead.org \
--cc=linux-s390@vger.kernel.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=loongarch@lists.linux.dev \
--cc=luto@kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=mhiramat@kernel.org \
--cc=mingo@redhat.com \
--cc=palmer@dabbelt.com \
--cc=paul.walmsley@sifive.com \
--cc=pbonzini@redhat.com \
--cc=peterz@infradead.org \
--cc=quic_eberman@quicinc.com \
--cc=rostedt@goodmis.org \
--cc=roypat@amazon.co.uk \
--cc=rppt@kernel.org \
--cc=seanjc@google.com \
--cc=shuah@kernel.org \
--cc=svens@linux.ibm.com \
--cc=tabba@google.com \
--cc=tglx@linutronix.de \
--cc=vannapurve@google.com \
--cc=will@kernel.org \
--cc=x86@kernel.org \
--cc=xmarcalx@amazon.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox