linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Mike Day <michael.day@amd.com>
To: Patrick Roy <roypat@amazon.co.uk>,
	tabba@google.com, quic_eberman@quicinc.com, david@redhat.com,
	seanjc@google.com, pbonzini@redhat.com, jthoughton@google.com,
	ackerleytng@google.com, vannapurve@google.com, rppt@kernel.org
Cc: graf@amazon.com, jgowans@amazon.com, derekmn@amazon.com,
	kalyazin@amazon.com, xmarcalx@amazon.com, linux-mm@kvack.org,
	corbet@lwn.net, catalin.marinas@arm.com, will@kernel.org,
	chenhuacai@kernel.org, kernel@xen0n.name,
	paul.walmsley@sifive.com, palmer@dabbelt.com,
	aou@eecs.berkeley.edu, hca@linux.ibm.com, gor@linux.ibm.com,
	agordeev@linux.ibm.com, borntraeger@linux.ibm.com,
	svens@linux.ibm.com, gerald.schaefer@linux.ibm.com,
	tglx@linutronix.de, mingo@redhat.com, bp@alien8.de,
	dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com,
	luto@kernel.org, peterz@infradead.org, rostedt@goodmis.org,
	mhiramat@kernel.org, mathieu.desnoyers@efficios.com,
	shuah@kernel.org, kvm@vger.kernel.org, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, loongarch@lists.linux.dev,
	linux-riscv@lists.infradead.org, linux-s390@vger.kernel.org,
	linux-trace-kernel@vger.kernel.org,
	linux-kselftest@vger.kernel.org
Subject: Re: [RFC PATCH v3 3/6] kvm: gmem: implement direct map manipulation routines
Date: Thu, 31 Oct 2024 09:19:41 -0500	[thread overview]
Message-ID: <80d700e8-5800-4128-b9fd-6bd37525facd@amd.com> (raw)
In-Reply-To: <20241030134912.515725-4-roypat@amazon.co.uk>



On 10/30/24 08:49, Patrick Roy wrote:
> Implement (yet unused) routines for manipulating guest_memfd direct map
> state. This is largely for illustration purposes.
> 
> kvm_gmem_set_direct_map allows manipulating arbitrary pgoff_t
> ranges, even if the covered memory has not yet been faulted in (in which
> case the requested direct map state is recorded in the xarray and will
> be applied by kvm_gmem_folio_configure_direct_map after the folio is
> faulted in and prepared/populated). This can be used to realize
> private/shared conversions on not-yet-faulted in memory, as discussed in
> the guest_memfd upstream call [1].
> 
> kvm_gmem_folio_set_direct_map allows manipulating the direct map entries
> for a gmem folio that the caller already holds a reference for (whereas
> kvm_gmem_set_direct_map needs to look up all folios intersecting the
> given pgoff range in the filemap first).
> 
> The xa lock serializes calls to kvm_gmem_folio_set_direct_map and
> kvm_gmem_set_direct_map, while the read side
> (kvm_gmem_folio_configure_direct_map) is protected by RCU. This is
> sufficient to ensure consistency between the xarray and the folio's
> actual direct map state, as kvm_gmem_folio_configure_direct_map is
> called only for freshly allocated folios, and before the folio lock is
> dropped for the first time, meaning kvm_gmem_folio_configure_direct_map
> always does it's set_direct_map calls before either of
> kvm_gmem_[folio_]set_direct_map get a chance. Even if a concurrent call
> to kvm_gmem_[folio_]set_direct_map happens, this ensures a sort of
> "eventual consistency" between xarray and actual direct map
> configuration by the time kvm_gmem_[folio_]set_direct_map exits.
> 
> [1]: https://lore.kernel.org/kvm/4b49248b-1cf1-44dc-9b50-ee551e1671ac@redhat.com/
> 
> Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
> ---
>   virt/kvm/guest_memfd.c | 125 +++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 125 insertions(+)
> 
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index 50ffc2ad73eda..54387828dcc6a 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -96,6 +96,131 @@ static int kvm_gmem_folio_configure_direct_map(struct folio *folio)
>   	return r;
>   }
>   
> +/*
> + * Updates the range [@start, @end] in @gmem_priv's direct map state xarray to be @state,
> + * e.g. erasing entries in this range if @state is the default state, and creating
> + * entries otherwise.
> + *
> + * Assumes the xa_lock is held.
> + */
> +static int __kvm_gmem_update_xarray(struct kvm_gmem_inode_private *gmem_priv, pgoff_t start,
> +				    pgoff_t end, bool state)
> +{
> +	struct xarray *xa = &gmem_priv->direct_map_state;
> +	int r = 0;
> +
> +	/*
> +	 * Cannot use xa_store_range, as multi-indexes cannot easily
> +	 * be partially updated.
> +	 */
> +	for (pgoff_t index = start; index < end; ++index) {
> +		if (state == gmem_priv->default_direct_map_state)
> +			__xa_erase(xa, index);
> +		else
> +			/* don't care _what_ we store in the xarray, only care about presence */
> +			__xa_store(xa, index, gmem_priv, GFP_KERNEL);
> +
> +		r = xa_err(xa);
> +		if (r)
> +			goto out;
> +	}
> +
> +out:
> +	return r;
> +}
> +
> +static int __kvm_gmem_folio_set_direct_map(struct folio *folio, pgoff_t start, pgoff_t end,
> +					   bool state)
> +{
> +	unsigned long npages = end - start + 1;
> +	struct page *first_page = folio_file_page(folio, start);
> +
> +	int r = set_direct_map_valid_noflush(first_page, npages, state);
> +
> +	flush_tlb_kernel_range((unsigned long)page_address(first_page),
> +			       (unsigned long)page_address(first_page) +
> +				       npages * PAGE_SIZE);
> +	return r;
> +}
> +
> +/*
> + * Updates the direct map status for the given range from @start to @end (inclusive), returning
> + * -EINVAL if this range is not completely contained within @folio. Also updates the
> + * xarray stored in the private data of the inode @folio is attached to.
> + *
> + * Takes and drops the folio lock.
> + */
> +static __always_unused int kvm_gmem_folio_set_direct_map(struct folio *folio, pgoff_t start,
> +								 pgoff_t end, bool state)
> +{
> +	struct inode *inode = folio_inode(folio);
> +	struct kvm_gmem_inode_private *gmem_priv = inode->i_private;
> +	int r = -EINVAL;
> +
> +	if (!folio_contains(folio, start) || !folio_contains(folio, end))
> +		goto out;
> +
> +	xa_lock(&gmem_priv->direct_map_state);
> +	r = __kvm_gmem_update_xarray(gmem_priv, start, end, state);
> +	if (r)
> +		goto unlock_xa;
> +
> +	folio_lock(folio);
> +	r = __kvm_gmem_folio_set_direct_map(folio, start, end, state);
> +	folio_unlock(folio);
> +
> +unlock_xa:
> +	xa_unlock(&gmem_priv->direct_map_state);
> +out:
> +	return r;
> +}
> +
> +/*
> + * Updates the direct map status for the given range from @start to @end (inclusive)
> + * of @inode. Folios in this range have their direct map entries reconfigured,
> + * and the xarray in the @inode's private data is updated.
> + */
> +static __always_unused int kvm_gmem_set_direct_map(struct inode *inode, pgoff_t start,
> +							   pgoff_t end, bool state)
> +{
> +	struct kvm_gmem_inode_private *gmem_priv = inode->i_private;
> +	struct folio_batch fbatch;
> +	pgoff_t index = start;
> +	unsigned int count, i;
> +	int r = 0;
> +
> +	xa_lock(&gmem_priv->direct_map_state);
> +
> +	r = __kvm_gmem_update_xarray(gmem_priv, start, end, state);
> +	if (r)
> +		goto out;
> +
	if (r) {
		xa_unlock(&gmem_priv->direct_map_state);
		goto out;
	}

thanks,

Mike

> +	folio_batch_init(&fbatch);
> +	while (!filemap_get_folios(inode->i_mapping, &index, end, &fbatch) && !r) {
> +		count = folio_batch_count(&fbatch);
> +		for (i = 0; i < count; i++) {
> +			struct folio *folio = fbatch.folios[i];
> +			pgoff_t folio_start = max(folio_index(folio), start);
> +			pgoff_t folio_end =
> +				min(folio_index(folio) + folio_nr_pages(folio),
> +				    end);
> +
> +			folio_lock(folio);
> +			r = __kvm_gmem_folio_set_direct_map(folio, folio_start,
> +							    folio_end, state);
> +			folio_unlock(folio);
> +
> +			if (r)
> +				break;
> +		}
> +		folio_batch_release(&fbatch);
> +	}
> +
> +	xa_unlock(&gmem_priv->direct_map_state);
> +out:
> +	return r;
> +}
> +
>   /**
>    * folio_file_pfn - like folio_file_page, but return a pfn.
>    * @folio: The folio which contains this index.


  reply	other threads:[~2024-10-31 14:20 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-30 13:49 [RFC PATCH v3 0/6] Direct Map Removal for guest_memfd Patrick Roy
2024-10-30 13:49 ` [RFC PATCH v3 1/6] arch: introduce set_direct_map_valid_noflush() Patrick Roy
2024-10-31  9:57   ` David Hildenbrand
2024-11-11 12:12     ` Vlastimil Babka
2024-11-12 14:48       ` Patrick Roy
2024-10-30 13:49 ` [RFC PATCH v3 2/6] kvm: gmem: add flag to remove memory from kernel direct map Patrick Roy
2024-10-31 13:56   ` Mike Day
2024-10-30 13:49 ` [RFC PATCH v3 3/6] kvm: gmem: implement direct map manipulation routines Patrick Roy
2024-10-31 14:19   ` Mike Day [this message]
2024-10-30 13:49 ` [RFC PATCH v3 4/6] kvm: gmem: add trace point for direct map state changes Patrick Roy
2024-10-30 13:49 ` [RFC PATCH v3 5/6] kvm: document KVM_GMEM_NO_DIRECT_MAP flag Patrick Roy
2024-10-30 13:49 ` [RFC PATCH v3 6/6] kvm: selftests: run gmem tests with KVM_GMEM_NO_DIRECT_MAP set Patrick Roy
2024-10-31  9:50 ` [RFC PATCH v3 0/6] Direct Map Removal for guest_memfd David Hildenbrand
2024-10-31 10:42   ` Patrick Roy
2024-11-01  0:10     ` Manwaring, Derek
2024-11-01 15:18       ` Sean Christopherson
2024-11-01 18:32         ` Kaplan, David
2024-11-01 16:06       ` Dave Hansen
2024-11-01 16:56         ` Manwaring, Derek
2024-11-01 17:20           ` Dave Hansen
2024-11-01 18:31             ` Manwaring, Derek
2024-11-01 18:43               ` Dave Hansen
2024-11-01 19:29                 ` Manwaring, Derek
2024-11-01 19:39                   ` Dave Hansen
2024-11-04  8:33           ` Reshetova, Elena
2024-11-06 17:04             ` Manwaring, Derek
2024-11-08 10:36               ` Reshetova, Elena
2024-11-13  3:31                 ` Manwaring, Derek
2024-11-04 12:18     ` David Hildenbrand
2024-11-04 13:09       ` Patrick Roy
2024-11-04 21:30         ` David Hildenbrand
2024-11-12 14:40           ` Patrick Roy
2024-11-12 14:52             ` David Hildenbrand
2024-11-15 16:59               ` Patrick Roy
2024-11-15 17:10                 ` David Hildenbrand
2024-11-15 17:23                   ` Patrick Roy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=80d700e8-5800-4128-b9fd-6bd37525facd@amd.com \
    --to=michael.day@amd.com \
    --cc=ackerleytng@google.com \
    --cc=agordeev@linux.ibm.com \
    --cc=aou@eecs.berkeley.edu \
    --cc=borntraeger@linux.ibm.com \
    --cc=bp@alien8.de \
    --cc=catalin.marinas@arm.com \
    --cc=chenhuacai@kernel.org \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=derekmn@amazon.com \
    --cc=gerald.schaefer@linux.ibm.com \
    --cc=gor@linux.ibm.com \
    --cc=graf@amazon.com \
    --cc=hca@linux.ibm.com \
    --cc=hpa@zytor.com \
    --cc=jgowans@amazon.com \
    --cc=jthoughton@google.com \
    --cc=kalyazin@amazon.com \
    --cc=kernel@xen0n.name \
    --cc=kvm@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=loongarch@lists.linux.dev \
    --cc=luto@kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mhiramat@kernel.org \
    --cc=mingo@redhat.com \
    --cc=palmer@dabbelt.com \
    --cc=paul.walmsley@sifive.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=quic_eberman@quicinc.com \
    --cc=rostedt@goodmis.org \
    --cc=roypat@amazon.co.uk \
    --cc=rppt@kernel.org \
    --cc=seanjc@google.com \
    --cc=shuah@kernel.org \
    --cc=svens@linux.ibm.com \
    --cc=tabba@google.com \
    --cc=tglx@linutronix.de \
    --cc=vannapurve@google.com \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    --cc=xmarcalx@amazon.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox