linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: Ackerley Tng <ackerleytng@google.com>,
	cgroups@vger.kernel.org, kvm@vger.kernel.org,
	linux-doc@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org,
	linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org,
	x86@kernel.org
Cc: akpm@linux-foundation.org, binbin.wu@linux.intel.com,
	bp@alien8.de, brauner@kernel.org, chao.p.peng@intel.com,
	chenhuacai@kernel.org, corbet@lwn.net, dave.hansen@intel.com,
	dave.hansen@linux.intel.com, david@redhat.com,
	dmatlack@google.com, erdemaktas@google.com, fan.du@intel.com,
	fvdl@google.com, haibo1.xu@intel.com, hannes@cmpxchg.org,
	hch@infradead.org, hpa@zytor.com, hughd@google.com,
	ira.weiny@intel.com, isaku.yamahata@intel.com, jack@suse.cz,
	james.morse@arm.com, jarkko@kernel.org, jgg@ziepe.ca,
	jgowans@amazon.com, jhubbard@nvidia.com, jroedel@suse.de,
	jthoughton@google.com, jun.miao@intel.com, kai.huang@intel.com,
	keirf@google.com, kent.overstreet@linux.dev,
	liam.merwick@oracle.com, maciej.wieczor-retman@intel.com,
	mail@maciej.szmigiero.name, maobibo@loongson.cn,
	mathieu.desnoyers@efficios.com, maz@kernel.org,
	mhiramat@kernel.org, mhocko@kernel.org, mic@digikod.net,
	michael.roth@amd.com, mingo@redhat.com, mlevitsk@redhat.com,
	mpe@ellerman.id.au, muchun.song@linux.dev, nikunj@amd.com,
	nsaenz@amazon.es, oliver.upton@linux.dev, palmer@dabbelt.com,
	pankaj.gupta@amd.com, paul.walmsley@sifive.com,
	pbonzini@redhat.com, peterx@redhat.com, pgonda@google.com,
	prsampat@amd.com, pvorel@suse.cz, qperret@google.com,
	richard.weiyang@gmail.com, rick.p.edgecombe@intel.com,
	rientjes@google.com, rostedt@goodmis.org, roypat@amazon.co.uk,
	rppt@kernel.org, seanjc@google.com, shakeel.butt@linux.dev,
	shuah@kernel.org, steven.price@arm.com,
	steven.sistare@oracle.com, suzuki.poulose@arm.com,
	tabba@google.com, tglx@linutronix.de, thomas.lendacky@amd.com,
	vannapurve@google.com, viro@zeniv.linux.org.uk,
	vkuznets@redhat.com, wei.w.wang@intel.com, will@kernel.org,
	willy@infradead.org, wyihan@google.com, xiaoyao.li@intel.com,
	yan.y.zhao@intel.com, yilun.xu@intel.com, yuzenghui@huawei.com,
	zhiquan1.li@intel.com
Subject: Re: [RFC PATCH v1 09/37] KVM: guest_memfd: Skip LRU for guest_memfd folios
Date: Mon, 27 Oct 2025 14:56:38 +0100	[thread overview]
Message-ID: <ce78da49-2525-44af-9cb8-301bf6a4658a@suse.cz> (raw)
In-Reply-To: <02aad35b728f4918e62dc6eb1d1d5546487b099e.1760731772.git.ackerleytng@google.com>

On 10/17/25 22:11, Ackerley Tng wrote:
> filemap_add_folio(), called from filemap_grab_folio(), adds folios to
> an LRU list. This is unnecessary for guest_memfd, which does not
> participate in swapping.

IIRC guest_memfd mappings are unevictable. That should mean they are not
ultimately added to a list (see lruvec_add_folio()).

> In addition, the LRU list takes a reference count on the folio. With

IIUC the refcount is temporary while being on the percpu
&cpu_fbatches.lru_add, added by __folio_batch_add_and_move(). When flushed
via folio_batch_move_lru(), the refcount is removed and there's only the LRU
folio flag that remains. The fbatch flushing can be triggered if you see an
unexpected refcount increase. So it might be feasible to do without this
patch (maybe it was already tried and there were substantial issues, in
which case should be mentioned).

> shared-to-private memory conversions for KVM guests dependent on folio
> refcounts, this extra reference can cause conversions to fail due to
> unexpected refcounts.
> 
> Rework kvm_gmem_get_folio() to manually allocate and insert the folio
> into the page cache without placing it on the LRU. This is done by
> calling __filemap_add_folio() directly.
> 
> The folio is then marked unevictable to avoid participation in
> swapping. The ->free_folio() handler is modified to unset the
> unevictable flag when the folio is released from guest_memfd.
> 
> This change ensures that LRU lists no longer take refcounts on
> guest_memfd folios, significantly reducing the chance of elevated
> refcounts during conversion.
> 
> To facilitate this, __filemap_add_folio is exported for KVM's use.
> 
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  mm/filemap.c           |  1 +
>  mm/memcontrol.c        |  2 ++
>  virt/kvm/guest_memfd.c | 60 +++++++++++++++++++++++++++++++++---------
>  3 files changed, 50 insertions(+), 13 deletions(-)
> 
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 03f223be575ca..60c7c95bbd7e6 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -954,6 +954,7 @@ noinline int __filemap_add_folio(struct address_space *mapping,
>  	return xas_error(&xas);
>  }
>  ALLOW_ERROR_INJECTION(__filemap_add_folio, ERRNO);
> +EXPORT_SYMBOL_FOR_MODULES(__filemap_add_folio, "kvm");
>  
>  int filemap_add_folio(struct address_space *mapping, struct folio *folio,
>  				pgoff_t index, gfp_t gfp)
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 8dd7fbed5a942..fe8629414d0a9 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -4721,6 +4721,7 @@ int __mem_cgroup_charge(struct folio *folio, struct mm_struct *mm, gfp_t gfp)
>  
>  	return ret;
>  }
> +EXPORT_SYMBOL_FOR_MODULES(__mem_cgroup_charge, "kvm");
>  
>  /**
>   * mem_cgroup_charge_hugetlb - charge the memcg for a hugetlb folio
> @@ -4893,6 +4894,7 @@ void __mem_cgroup_uncharge(struct folio *folio)
>  	uncharge_folio(folio, &ug);
>  	uncharge_batch(&ug);
>  }
> +EXPORT_SYMBOL_FOR_MODULES(__mem_cgroup_uncharge, "kvm");
>  
>  void __mem_cgroup_uncharge_folios(struct folio_batch *folios)
>  {
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index 2a9e9220a48aa..dab2b3ce78bc8 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -148,6 +148,41 @@ static struct mempolicy *kvm_gmem_get_folio_policy(struct gmem_inode *gi,
>  #endif
>  }
>  
> +static struct folio *__kvm_gmem_get_folio(struct address_space *mapping,
> +					  pgoff_t index,
> +					  struct mempolicy *policy)
> +{
> +	const gfp_t gfp = mapping_gfp_mask(mapping);
> +	struct folio *folio;
> +	int err;
> +
> +	folio = filemap_lock_folio(mapping, index);
> +	if (!IS_ERR(folio))
> +		return folio;
> +
> +	folio = filemap_alloc_folio(gfp, 0, policy);
> +	if (!folio)
> +		return ERR_PTR(-ENOMEM);
> +
> +	err = mem_cgroup_charge(folio, NULL, gfp);
> +	if (err)
> +		goto err_put;
> +
> +	__folio_set_locked(folio);
> +
> +	err = __filemap_add_folio(mapping, folio, index, gfp, NULL);
> +	if (err) {
> +		__folio_clear_locked(folio);
> +		goto err_put;
> +	}
> +
> +	return folio;
> +
> +err_put:
> +	folio_put(folio);
> +	return ERR_PTR(err);
> +}
> +
>  /*
>   * Returns a locked folio on success.  The caller is responsible for
>   * setting the up-to-date flag before the memory is mapped into the guest.
> @@ -160,6 +195,7 @@ static struct mempolicy *kvm_gmem_get_folio_policy(struct gmem_inode *gi,
>  static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index)
>  {
>  	/* TODO: Support huge pages. */
> +	struct address_space *mapping = inode->i_mapping;
>  	struct mempolicy *policy;
>  	struct folio *folio;
>  
> @@ -167,16 +203,17 @@ static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index)
>  	 * Fast-path: See if folio is already present in mapping to avoid
>  	 * policy_lookup.
>  	 */
> -	folio = filemap_lock_folio(inode->i_mapping, index);
> +	folio = filemap_lock_folio(mapping, index);
>  	if (!IS_ERR(folio))
>  		return folio;
>  
>  	policy = kvm_gmem_get_folio_policy(GMEM_I(inode), index);
> -	folio = __filemap_get_folio_mpol(inode->i_mapping, index,
> -					 FGP_LOCK | FGP_CREAT,
> -					 mapping_gfp_mask(inode->i_mapping), policy);
> -	mpol_cond_put(policy);
>  
> +	do {
> +		folio = __kvm_gmem_get_folio(mapping, index, policy);
> +	} while (IS_ERR(folio) && PTR_ERR(folio) == -EEXIST);
> +
> +	mpol_cond_put(policy);
>  	return folio;
>  }
>  
> @@ -588,24 +625,21 @@ static int kvm_gmem_error_folio(struct address_space *mapping, struct folio *fol
>  	return MF_DELAYED;
>  }
>  
> -#ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE
>  static void kvm_gmem_free_folio(struct folio *folio)
>  {
> -	struct page *page = folio_page(folio, 0);
> -	kvm_pfn_t pfn = page_to_pfn(page);
> -	int order = folio_order(folio);
> +	folio_clear_unevictable(folio);
>  
> -	kvm_arch_gmem_invalidate(pfn, pfn + (1ul << order));
> -}
> +#ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE
> +	kvm_arch_gmem_invalidate(folio_pfn(folio),
> +				 folio_pfn(folio) + folio_nr_pages(folio));
>  #endif
> +}
>  
>  static const struct address_space_operations kvm_gmem_aops = {
>  	.dirty_folio = noop_dirty_folio,
>  	.migrate_folio	= kvm_gmem_migrate_folio,
>  	.error_remove_folio = kvm_gmem_error_folio,
> -#ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE
>  	.free_folio = kvm_gmem_free_folio,
> -#endif
>  };
>  
>  static int kvm_gmem_setattr(struct mnt_idmap *idmap, struct dentry *dentry,



  reply	other threads:[~2025-10-27 13:56 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-17 20:11 [RFC PATCH v1 00/37] guest_memfd: In-place conversion support Ackerley Tng
2025-10-17 20:11 ` [RFC PATCH v1 01/37] KVM: guest_memfd: Introduce per-gmem attributes, use to guard user mappings Ackerley Tng
2025-10-27 13:27   ` Vlastimil Babka
2025-11-12  8:58   ` Binbin Wu
2025-10-17 20:11 ` [RFC PATCH v1 02/37] KVM: Rename KVM_GENERIC_MEMORY_ATTRIBUTES to KVM_VM_MEMORY_ATTRIBUTES Ackerley Tng
2025-10-17 20:11 ` [RFC PATCH v1 03/37] KVM: Enumerate support for PRIVATE memory iff kvm_arch_has_private_mem is defined Ackerley Tng
2025-11-13  1:42   ` Binbin Wu
2025-10-17 20:11 ` [RFC PATCH v1 04/37] KVM: Stub in ability to disable per-VM memory attribute tracking Ackerley Tng
2025-10-17 20:11 ` [RFC PATCH v1 05/37] KVM: guest_memfd: Wire up kvm_get_memory_attributes() to per-gmem attributes Ackerley Tng
2025-10-17 20:11 ` [RFC PATCH v1 06/37] KVM: guest_memfd: Update kvm_gmem_populate() to use gmem attributes Ackerley Tng
2025-11-10 10:01   ` Yan Zhao
2025-11-15  0:52     ` Ackerley Tng
2025-10-17 20:11 ` [RFC PATCH v1 07/37] KVM: Introduce KVM_SET_MEMORY_ATTRIBUTES2 Ackerley Tng
2025-10-22 15:21   ` Steven Price
2025-10-22 16:51     ` Ackerley Tng
2025-10-22 22:45       ` Ackerley Tng
2025-10-22 23:30         ` Sean Christopherson
2025-10-23 14:01           ` Ackerley Tng
2025-10-23 15:05             ` Sean Christopherson
2025-10-24 14:36               ` Ackerley Tng
2025-10-24 15:11                 ` Sean Christopherson
2025-10-24 16:41                   ` Ackerley Tng
2025-10-24 17:45                     ` Sean Christopherson
2025-10-27 12:48                       ` Ackerley Tng
2025-10-17 20:11 ` [RFC PATCH v1 08/37] KVM: guest_memfd: Don't set FGP_ACCESSED when getting folios Ackerley Tng
2025-10-27 13:39   ` Vlastimil Babka
2025-10-17 20:11 ` [RFC PATCH v1 09/37] KVM: guest_memfd: Skip LRU for guest_memfd folios Ackerley Tng
2025-10-27 13:56   ` Vlastimil Babka [this message]
2025-10-17 20:11 ` [RFC PATCH v1 10/37] KVM: guest_memfd: Enable INIT_SHARED on guest_memfd for x86 Coco VMs Ackerley Tng
2025-10-17 20:11 ` [RFC PATCH v1 11/37] KVM: guest_memfd: Add support for KVM_SET_MEMORY_ATTRIBUTES Ackerley Tng
2025-11-04  9:25   ` Yan Zhao
2025-11-04 15:29     ` Vishal Annapurve
2025-11-15  0:46     ` Ackerley Tng
2025-10-17 20:11 ` [RFC PATCH v1 12/37] KVM: Move KVM_VM_MEMORY_ATTRIBUTES config definition to x86 Ackerley Tng
2025-10-17 20:11 ` [RFC PATCH v1 13/37] KVM: Let userspace disable per-VM mem attributes, enable per-gmem attributes Ackerley Tng
2025-10-17 20:11 ` [RFC PATCH v1 14/37] KVM: selftests: Create gmem fd before "regular" fd when adding memslot Ackerley Tng
2025-10-17 20:11 ` [RFC PATCH v1 15/37] KVM: selftests: Rename guest_memfd{,_offset} to gmem_{fd,offset} Ackerley Tng
2025-10-17 20:11 ` [RFC PATCH v1 16/37] KVM: selftests: Add support for mmap() on guest_memfd in core library Ackerley Tng
2025-10-24 16:48   ` Ackerley Tng
2025-10-24 18:18     ` Sean Christopherson
2025-10-27 12:51       ` Ackerley Tng
2025-10-17 20:11 ` [RFC PATCH v1 17/37] KVM: selftests: Update framework to use KVM_SET_MEMORY_ATTRIBUTES2 Ackerley Tng
2025-10-17 20:11 ` [RFC PATCH v1 18/37] KVM: selftests: Add helpers for calling ioctls on guest_memfd Ackerley Tng
2025-10-17 20:12 ` [RFC PATCH v1 19/37] KVM: selftests: guest_memfd: Test basic single-page conversion flow Ackerley Tng
2025-10-17 20:12 ` [RFC PATCH v1 20/37] KVM: selftests: guest_memfd: Test conversion flow when INIT_SHARED Ackerley Tng
2025-10-17 20:12 ` [RFC PATCH v1 21/37] KVM: selftests: guest_memfd: Test indexing in guest_memfd Ackerley Tng
2025-10-17 20:12 ` [RFC PATCH v1 22/37] KVM: selftests: guest_memfd: Test conversion before allocation Ackerley Tng
2025-10-17 20:12 ` [RFC PATCH v1 23/37] KVM: selftests: guest_memfd: Convert with allocated folios in different layouts Ackerley Tng
2025-10-17 20:12 ` [RFC PATCH v1 24/37] KVM: selftests: guest_memfd: Test precision of conversion Ackerley Tng
2025-10-17 20:12 ` [RFC PATCH v1 25/37] KVM: selftests: guest_memfd: Test that truncation does not change shared/private status Ackerley Tng
2025-10-17 20:12 ` [RFC PATCH v1 26/37] KVM: selftests: guest_memfd: Test that shared/private status is consistent across processes Ackerley Tng
2025-10-17 23:33   ` Ackerley Tng
2025-10-17 20:12 ` [RFC PATCH v1 27/37] KVM: selftests: guest_memfd: Test conversion with elevated page refcount Ackerley Tng
2025-10-17 20:12 ` [RFC PATCH v1 28/37] KVM: selftests: Reset shared memory after hole-punching Ackerley Tng
2025-10-17 20:12 ` [RFC PATCH v1 29/37] KVM: selftests: Add selftests global for guest memory attributes capability Ackerley Tng
2025-10-17 20:12 ` [RFC PATCH v1 30/37] KVM: selftests: Provide function to look up guest_memfd details from gpa Ackerley Tng
2025-10-17 20:12 ` [RFC PATCH v1 31/37] KVM: selftests: Provide common function to set memory attributes Ackerley Tng
2025-10-17 20:12 ` [RFC PATCH v1 32/37] KVM: selftests: Check fd/flags provided to mmap() when setting up memslot Ackerley Tng
2025-10-17 20:12 ` [RFC PATCH v1 33/37] KVM: selftests: Make TEST_EXPECT_SIGBUS thread-safe Ackerley Tng
2025-10-17 20:12 ` [RFC PATCH v1 34/37] KVM: selftests: Update private_mem_conversions_test to mmap() guest_memfd Ackerley Tng
2025-10-17 20:12 ` [RFC PATCH v1 35/37] KVM: selftests: Add script to exercise private_mem_conversions_test Ackerley Tng
2025-10-17 20:12 ` [RFC PATCH v1 36/37] KVM: selftests: Update pre-fault test to work with per-guest_memfd attributes Ackerley Tng
2025-10-17 20:12 ` [RFC PATCH v1 37/37] KVM: selftests: Update private memory exits test work with per-gmem attributes Ackerley Tng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ce78da49-2525-44af-9cb8-301bf6a4658a@suse.cz \
    --to=vbabka@suse.cz \
    --cc=ackerleytng@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=binbin.wu@linux.intel.com \
    --cc=bp@alien8.de \
    --cc=brauner@kernel.org \
    --cc=cgroups@vger.kernel.org \
    --cc=chao.p.peng@intel.com \
    --cc=chenhuacai@kernel.org \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=dmatlack@google.com \
    --cc=erdemaktas@google.com \
    --cc=fan.du@intel.com \
    --cc=fvdl@google.com \
    --cc=haibo1.xu@intel.com \
    --cc=hannes@cmpxchg.org \
    --cc=hch@infradead.org \
    --cc=hpa@zytor.com \
    --cc=hughd@google.com \
    --cc=ira.weiny@intel.com \
    --cc=isaku.yamahata@intel.com \
    --cc=jack@suse.cz \
    --cc=james.morse@arm.com \
    --cc=jarkko@kernel.org \
    --cc=jgg@ziepe.ca \
    --cc=jgowans@amazon.com \
    --cc=jhubbard@nvidia.com \
    --cc=jroedel@suse.de \
    --cc=jthoughton@google.com \
    --cc=jun.miao@intel.com \
    --cc=kai.huang@intel.com \
    --cc=keirf@google.com \
    --cc=kent.overstreet@linux.dev \
    --cc=kvm@vger.kernel.org \
    --cc=liam.merwick@oracle.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=maciej.wieczor-retman@intel.com \
    --cc=mail@maciej.szmigiero.name \
    --cc=maobibo@loongson.cn \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=maz@kernel.org \
    --cc=mhiramat@kernel.org \
    --cc=mhocko@kernel.org \
    --cc=mic@digikod.net \
    --cc=michael.roth@amd.com \
    --cc=mingo@redhat.com \
    --cc=mlevitsk@redhat.com \
    --cc=mpe@ellerman.id.au \
    --cc=muchun.song@linux.dev \
    --cc=nikunj@amd.com \
    --cc=nsaenz@amazon.es \
    --cc=oliver.upton@linux.dev \
    --cc=palmer@dabbelt.com \
    --cc=pankaj.gupta@amd.com \
    --cc=paul.walmsley@sifive.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=pgonda@google.com \
    --cc=prsampat@amd.com \
    --cc=pvorel@suse.cz \
    --cc=qperret@google.com \
    --cc=richard.weiyang@gmail.com \
    --cc=rick.p.edgecombe@intel.com \
    --cc=rientjes@google.com \
    --cc=rostedt@goodmis.org \
    --cc=roypat@amazon.co.uk \
    --cc=rppt@kernel.org \
    --cc=seanjc@google.com \
    --cc=shakeel.butt@linux.dev \
    --cc=shuah@kernel.org \
    --cc=steven.price@arm.com \
    --cc=steven.sistare@oracle.com \
    --cc=suzuki.poulose@arm.com \
    --cc=tabba@google.com \
    --cc=tglx@linutronix.de \
    --cc=thomas.lendacky@amd.com \
    --cc=vannapurve@google.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=vkuznets@redhat.com \
    --cc=wei.w.wang@intel.com \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=wyihan@google.com \
    --cc=x86@kernel.org \
    --cc=xiaoyao.li@intel.com \
    --cc=yan.y.zhao@intel.com \
    --cc=yilun.xu@intel.com \
    --cc=yuzenghui@huawei.com \
    --cc=zhiquan1.li@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox