Re: [RFC PATCH 00/39] 1G page support for guest_memfd

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Amit Shah <amit@infradead.org>
To: Ackerley Tng <ackerleytng@google.com>
Cc: tabba@google.com, quic_eberman@quicinc.com, roypat@amazon.co.uk,
	 jgg@nvidia.com, peterx@redhat.com, david@redhat.com,
	rientjes@google.com,  fvdl@google.com, jthoughton@google.com,
	seanjc@google.com, pbonzini@redhat.com, 	zhiquan1.li@intel.com,
	fan.du@intel.com, jun.miao@intel.com, 	isaku.yamahata@intel.com,
	muchun.song@linux.dev, mike.kravetz@oracle.com,
		erdemaktas@google.com, vannapurve@google.com,
	qperret@google.com, 	jhubbard@nvidia.com, willy@infradead.org,
	shuah@kernel.org, brauner@kernel.org, 	bfoster@redhat.com,
	kent.overstreet@linux.dev, pvorel@suse.cz, rppt@kernel.org,
	 richard.weiyang@gmail.com, anup@brainfault.org,
	haibo1.xu@intel.com,  ajones@ventanamicro.com,
	vkuznets@redhat.com, maciej.wieczor-retman@intel.com,
		pgonda@google.com, oliver.upton@linux.dev,
	linux-kernel@vger.kernel.org, 	linux-mm@kvack.org,
	kvm@vger.kernel.org, linux-kselftest@vger.kernel.org,
	 linux-fsdevel@kvack.org
Subject: Re: [RFC PATCH 00/39] 1G page support for guest_memfd
Date: Thu, 06 Feb 2025 12:07:58 +0100	[thread overview]
Message-ID: <f8030dfc5086e4e4e3709d6fcdab1e38f01fc38d.camel@infradead.org> (raw)
In-Reply-To: <diqzr04fpgsf.fsf@ackerleytng-ctop-specialist.c.googlers.com>

On Mon, 2025-02-03 at 08:35 +0000, Ackerley Tng wrote:
> Amit Shah <amit@infradead.org> writes:
> 
> > Hey Ackerley,
> 
> Hi Amit,
> 
> > On Tue, 2024-09-10 at 23:43 +0000, Ackerley Tng wrote:
> > > Hello,
> > > 
> > > This patchset is our exploration of how to support 1G pages in
> > > guest_memfd, and
> > > how the pages will be used in Confidential VMs.
> > 
> > We've discussed this patchset at LPC and in the guest-memfd calls. 
> > Can
> > you please summarise the discussions here as a follow-up, so we can
> > also continue discussing on-list, and not repeat things that are
> > already discussed?
> 
> Thanks for this question! Since LPC, Vishal and I have been tied up
> with
> some Google internal work, which slowed down progress on 1G page
> support
> for guest_memfd. We will have progress this quarter and the next few
> quarters on 1G page support for guest_memfd.
> 
> The related updates are
> 
> 1. No objections on using hugetlb as the source of 1G pages.
> 
> 2. Prerequisite hugetlb changes.
> 
> + I've separated some of the prerequisite hugetlb changes into
> another
>   patch series hoping to have them merged ahead of and separately
> from
>   this patchset [1].
> + Peter Xu contributed a better patchset, including a bugfix [2].
> + I have an alternative [3].
> + The next revision of this series (1G page support for guest_memfd)
>   will be based on alternative [3]. I think there should be no issues
>   there.
> + I believe Peter is also waiting on the next revision before we make
>   further progress/decide on [2] or [3].
> 
> 3. No objections for allowing mmap()-ing of guest_memfd physical
> memory
>    when memory is marked shared to avoid double-allocation.
> 
> 4. No objections for splitting pages when marked shared.
> 
> 5. folio_put() callback for guest_memfd folio cleanup/merging.
> 
> + In Fuad's series [4], Fuad used the callback to reset the folio's
>   mappability status.
> + The catch is that the callback is only invoked when folio-
> >page_type
>   == PGTY_guest_memfd, and folio->page_type is a union with folio's
>   mapcount, so any folio with a non-zero mapcount cannot have a valid
>   page_type.
> + I was concerned that we might not get a callback, and hence
>   unintentionally skip merging pages and not correctly restore
> hugetlb
>   pages
> + This was discussed at the last guest_memfd upstream call (2025-01-
> 23
>   07:58 PST), and the conclusion is that using folio->page_type
> works,
>   because
>     + We only merge folios in two cases: (1) when converting to
> private
>       (2) when truncating folios (removing from filemap).
>     + When converting to private, in (1), we can forcibly unmap all
> the
>       converted pages or check if the mapcount is 0, and once
> mapcount
>       is 0 we can install the callback by setting folio->page_type =
>       PGTY_guest_memfd
>     + When truncating, we will be unmapping the folios anyway, so
>       mapcount is also 0 and we can install the callback.
> 
> Hope that covers the points that you're referring to. If there are
> other
> parts that you'd like to know the status on, please let me know which
> aspects those are!

Thank you for the nice summary!

> > Also - as mentioned in those meetings, we at AMD are interested in
> > this
> > series along with SEV-SNP support - and I'm also interested in
> > figuring
> > out how we collaborate on the evolution of this series.
> 
> Thanks all your help and comments during the guest_memfd upstream
> calls,
> and thanks for the help from AMD.
> 
> Extending mmap() support from Fuad with 1G page support introduces
> more
> states that made it more complicated (at least for me).
> 
> I'm modeling the states in python so I can iterate more quickly. I
> also
> have usage flows (e.g. allocate, guest_use, host_use,
> transient_folio_get, close, transient_folio_put) as test cases.
> 
> I'm almost done with the model and my next steps are to write up a
> state
> machine (like Fuad's [5]) and share that.
> 
> I'd be happy to share the python model too but I have to work through
> some internal open-sourcing processes first, so if you think this
> will
> be useful, let me know!

No problem.  Yes, I'm interested in this - it'll be helpful!

The other thing of note is that while we have the kernel patches, a
userspace to drive them and exercise them is currently missing.

> Then, I'll code it all up in a new revision of this series (target:
> March 2025), which will be accompanied by source code on GitHub.
> 
> I'm happy to collaborate more closely, let me know if you have ideas
> for
> collaboration!

Thank you.  I think currently the bigger problem we have is allocation
of hugepages -- which is also blocking a lot of the follow-on work. 
Vishal briefly mentioned isolating pages from Linux entirely last time
- that's also what I'm interested in to figure out if we can completely
bypass the allocation problem by not allocating struct pages for non-
host use pages entirely.  The guest_memfs/KHO/kexec/live-update patches
also take this approach on AWS (for how their VMs are launched).  If we
work with those patches together, allocation of 1G hugepages is
simplified.  I'd like to discuss more on these themes to see if this is
an approach that helps as well.


		Amit

next prev parent reply	other threads:[~2025-02-06 11:08 UTC|newest]

Thread overview: 130+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-10 23:43 Ackerley Tng
2024-09-10 23:43 ` [RFC PATCH 01/39] mm: hugetlb: Simplify logic in dequeue_hugetlb_folio_vma() Ackerley Tng
2024-09-10 23:43 ` [RFC PATCH 02/39] mm: hugetlb: Refactor vma_has_reserves() to should_use_hstate_resv() Ackerley Tng
2024-09-10 23:43 ` [RFC PATCH 03/39] mm: hugetlb: Remove unnecessary check for avoid_reserve Ackerley Tng
2024-09-10 23:43 ` [RFC PATCH 04/39] mm: mempolicy: Refactor out policy_node_nodemask() Ackerley Tng
2024-09-11 16:46   ` Gregory Price
2024-09-10 23:43 ` [RFC PATCH 05/39] mm: hugetlb: Refactor alloc_buddy_hugetlb_folio_with_mpol() to interpret mempolicy instead of vma Ackerley Tng
2024-09-10 23:43 ` [RFC PATCH 06/39] mm: hugetlb: Refactor dequeue_hugetlb_folio_vma() to use mpol Ackerley Tng
2024-09-10 23:43 ` [RFC PATCH 07/39] mm: hugetlb: Refactor out hugetlb_alloc_folio Ackerley Tng
2024-09-10 23:43 ` [RFC PATCH 08/39] mm: truncate: Expose preparation steps for truncate_inode_pages_final Ackerley Tng
2024-09-10 23:43 ` [RFC PATCH 09/39] mm: hugetlb: Expose hugetlb_subpool_{get,put}_pages() Ackerley Tng
2024-09-10 23:43 ` [RFC PATCH 10/39] mm: hugetlb: Add option to create new subpool without using surplus Ackerley Tng
2024-09-10 23:43 ` [RFC PATCH 11/39] mm: hugetlb: Expose hugetlb_acct_memory() Ackerley Tng
2024-09-10 23:43 ` [RFC PATCH 12/39] mm: hugetlb: Move and expose hugetlb_zero_partial_page() Ackerley Tng
2024-09-10 23:43 ` [RFC PATCH 13/39] KVM: guest_memfd: Make guest mem use guest mem inodes instead of anonymous inodes Ackerley Tng
2025-04-02  4:01   ` Yan Zhao
2025-04-23 20:22     ` Ackerley Tng
2025-04-24  3:53       ` Yan Zhao
2024-09-10 23:43 ` [RFC PATCH 14/39] KVM: guest_memfd: hugetlb: initialization and cleanup Ackerley Tng
2024-09-20  9:17   ` Vishal Annapurve
2024-10-01 23:00     ` Ackerley Tng
2024-12-01 17:59   ` Peter Xu
2025-02-13  9:47     ` Ackerley Tng
2025-02-26 18:55       ` Ackerley Tng
2025-03-06 17:33   ` Peter Xu
2024-09-10 23:43 ` [RFC PATCH 15/39] KVM: guest_memfd: hugetlb: allocate and truncate from hugetlb Ackerley Tng
2024-09-13 22:26   ` Elliot Berman
2024-10-03 20:23     ` Ackerley Tng
2024-10-30  9:01   ` Jun Miao
2025-02-11  1:21     ` Ackerley Tng
2024-12-01 17:55   ` Peter Xu
2025-02-13  7:52     ` Ackerley Tng
2025-02-13 16:48       ` Peter Xu
2024-09-10 23:43 ` [RFC PATCH 16/39] KVM: guest_memfd: Add page alignment check for hugetlb guest_memfd Ackerley Tng
2024-09-10 23:43 ` [RFC PATCH 17/39] KVM: selftests: Add basic selftests for hugetlb-backed guest_memfd Ackerley Tng
2024-09-10 23:43 ` [RFC PATCH 18/39] KVM: selftests: Support various types of backing sources for private memory Ackerley Tng
2024-09-10 23:43 ` [RFC PATCH 19/39] KVM: selftests: Update test for various private memory backing source types Ackerley Tng
2024-09-10 23:43 ` [RFC PATCH 20/39] KVM: selftests: Add private_mem_conversions_test.sh Ackerley Tng
2024-09-10 23:43 ` [RFC PATCH 21/39] KVM: selftests: Test that guest_memfd usage is reported via hugetlb Ackerley Tng
2024-09-10 23:43 ` [RFC PATCH 22/39] mm: hugetlb: Expose vmemmap optimization functions Ackerley Tng
2024-09-10 23:43 ` [RFC PATCH 23/39] mm: hugetlb: Expose HugeTLB functions for promoting/demoting pages Ackerley Tng
2024-09-10 23:43 ` [RFC PATCH 24/39] mm: hugetlb: Add functions to add/move/remove from hugetlb lists Ackerley Tng
2024-09-10 23:43 ` [RFC PATCH 25/39] KVM: guest_memfd: Split HugeTLB pages for guest_memfd use Ackerley Tng
2024-09-10 23:43 ` [RFC PATCH 26/39] KVM: guest_memfd: Track faultability within a struct kvm_gmem_private Ackerley Tng
2024-10-10 16:06   ` Peter Xu
2024-10-11 23:32     ` Ackerley Tng
2024-10-15 21:34       ` Peter Xu
2024-10-15 23:42         ` Ackerley Tng
2024-10-16  8:45           ` David Hildenbrand
2024-10-16 20:16             ` Peter Xu
2024-10-16 22:51               ` Jason Gunthorpe
2024-10-16 23:49                 ` Peter Xu
2024-10-16 23:54                   ` Jason Gunthorpe
2024-10-17 14:58                     ` Peter Xu
2024-10-17 16:47                       ` Jason Gunthorpe
2024-10-17 17:05                         ` Peter Xu
2024-10-17 17:10                           ` Jason Gunthorpe
2024-10-17 19:11                             ` Peter Xu
2024-10-17 19:18                               ` Jason Gunthorpe
2024-10-17 19:29                                 ` David Hildenbrand
2024-10-18  7:15                                 ` Patrick Roy
2024-10-18  7:50                                   ` David Hildenbrand
2024-10-18  9:34                                     ` Patrick Roy
2024-10-17 17:11                         ` David Hildenbrand
2024-10-17 17:16                           ` Jason Gunthorpe
2024-10-17 17:55                             ` David Hildenbrand
2024-10-17 18:26                             ` Vishal Annapurve
2024-10-17 14:56                   ` David Hildenbrand
2024-10-17 15:02               ` David Hildenbrand
2024-10-16  8:50           ` David Hildenbrand
2024-10-16 10:48             ` Vishal Annapurve
2024-10-16 11:54               ` David Hildenbrand
2024-10-16 11:57                 ` Jason Gunthorpe
2025-02-25 20:37   ` Peter Xu
2025-04-23 22:07     ` Ackerley Tng
2024-09-10 23:43 ` [RFC PATCH 27/39] KVM: guest_memfd: Allow mmapping guest_memfd files Ackerley Tng
2025-01-20 22:42   ` Peter Xu
2025-04-23 20:25     ` Ackerley Tng
2025-03-04 23:24   ` Peter Xu
2025-04-02  4:07   ` Yan Zhao
2025-04-23 20:28     ` Ackerley Tng
2024-09-10 23:43 ` [RFC PATCH 28/39] KVM: guest_memfd: Use vm_type to determine default faultability Ackerley Tng
2024-09-10 23:44 ` [RFC PATCH 29/39] KVM: Handle conversions in the SET_MEMORY_ATTRIBUTES ioctl Ackerley Tng
2024-09-10 23:44 ` [RFC PATCH 30/39] KVM: guest_memfd: Handle folio preparation for guest_memfd mmap Ackerley Tng
2024-09-16 20:00   ` Elliot Berman
2024-10-03 21:32     ` Ackerley Tng
2024-10-03 23:43       ` Ackerley Tng
2024-10-08 19:30         ` Sean Christopherson
2024-10-07 15:56       ` Patrick Roy
2024-10-08 18:07         ` Ackerley Tng
2024-10-08 19:56           ` Sean Christopherson
2024-10-09  3:51             ` Manwaring, Derek
2024-10-09 13:52               ` Andrew Cooper
2024-10-10 16:21             ` Patrick Roy
2024-10-10 19:27               ` Manwaring, Derek
2024-10-17 23:16               ` Ackerley Tng
2024-10-18  7:10                 ` Patrick Roy
2024-09-10 23:44 ` [RFC PATCH 31/39] KVM: selftests: Allow vm_set_memory_attributes to be used without asserting return value of 0 Ackerley Tng
2024-09-10 23:44 ` [RFC PATCH 32/39] KVM: selftests: Test using guest_memfd memory from userspace Ackerley Tng
2024-09-10 23:44 ` [RFC PATCH 33/39] KVM: selftests: Test guest_memfd memory sharing between guest and host Ackerley Tng
2024-09-10 23:44 ` [RFC PATCH 34/39] KVM: selftests: Add notes in private_mem_kvm_exits_test for mmap-able guest_memfd Ackerley Tng
2024-09-10 23:44 ` [RFC PATCH 35/39] KVM: selftests: Test that pinned pages block KVM from setting memory attributes to PRIVATE Ackerley Tng
2024-09-10 23:44 ` [RFC PATCH 36/39] KVM: selftests: Refactor vm_mem_add to be more flexible Ackerley Tng
2024-09-10 23:44 ` [RFC PATCH 37/39] KVM: selftests: Add helper to perform madvise by memslots Ackerley Tng
2024-09-10 23:44 ` [RFC PATCH 38/39] KVM: selftests: Update private_mem_conversions_test for mmap()able guest_memfd Ackerley Tng
2024-09-10 23:44 ` [RFC PATCH 39/39] KVM: guest_memfd: Dynamically split/reconstruct HugeTLB page Ackerley Tng
2025-04-03 12:33   ` Yan Zhao
2025-04-23 22:02     ` Ackerley Tng
2025-04-24  1:09       ` Yan Zhao
2025-04-24  4:25         ` Yan Zhao
2025-04-24  5:55           ` Chenyi Qiang
2025-04-24  8:13             ` Yan Zhao
2025-04-24 14:10               ` Vishal Annapurve
2025-04-24 18:15                 ` Ackerley Tng
2025-04-25  4:02                   ` Yan Zhao
2025-04-25 22:45                     ` Ackerley Tng
2025-04-28  1:05                       ` Yan Zhao
2025-04-28 19:02                         ` Vishal Annapurve
2025-04-30 20:09                         ` Ackerley Tng
2025-05-06  1:23                           ` Yan Zhao
2025-05-06 19:22                             ` Ackerley Tng
2025-05-07  3:15                               ` Yan Zhao
2025-05-13 17:33                                 ` Ackerley Tng
2024-09-11  6:56 ` [RFC PATCH 00/39] 1G page support for guest_memfd Michal Hocko
2024-09-14  1:08 ` Du, Fan
2024-09-14 13:34   ` Vishal Annapurve
2025-01-28  9:42 ` Amit Shah
2025-02-03  8:35   ` Ackerley Tng
2025-02-06 11:07     ` Amit Shah [this message]
2025-02-07  6:25       ` Ackerley Tng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f8030dfc5086e4e4e3709d6fcdab1e38f01fc38d.camel@infradead.org \
    --to=amit@infradead.org \
    --cc=ackerleytng@google.com \
    --cc=ajones@ventanamicro.com \
    --cc=anup@brainfault.org \
    --cc=bfoster@redhat.com \
    --cc=brauner@kernel.org \
    --cc=david@redhat.com \
    --cc=erdemaktas@google.com \
    --cc=fan.du@intel.com \
    --cc=fvdl@google.com \
    --cc=haibo1.xu@intel.com \
    --cc=isaku.yamahata@intel.com \
    --cc=jgg@nvidia.com \
    --cc=jhubbard@nvidia.com \
    --cc=jthoughton@google.com \
    --cc=jun.miao@intel.com \
    --cc=kent.overstreet@linux.dev \
    --cc=kvm@vger.kernel.org \
    --cc=linux-fsdevel@kvack.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=maciej.wieczor-retman@intel.com \
    --cc=mike.kravetz@oracle.com \
    --cc=muchun.song@linux.dev \
    --cc=oliver.upton@linux.dev \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=pgonda@google.com \
    --cc=pvorel@suse.cz \
    --cc=qperret@google.com \
    --cc=quic_eberman@quicinc.com \
    --cc=richard.weiyang@gmail.com \
    --cc=rientjes@google.com \
    --cc=roypat@amazon.co.uk \
    --cc=rppt@kernel.org \
    --cc=seanjc@google.com \
    --cc=shuah@kernel.org \
    --cc=tabba@google.com \
    --cc=vannapurve@google.com \
    --cc=vkuznets@redhat.com \
    --cc=willy@infradead.org \
    --cc=zhiquan1.li@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox