Re: [RFC PATCH 30/39] KVM: guest_memfd: Handle folio preparation for guest_memfd mmap

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Ackerley Tng <ackerleytng@google.com>
To: Patrick Roy <roypat@amazon.co.uk>
Cc: quic_eberman@quicinc.com, tabba@google.com, jgg@nvidia.com,
	 peterx@redhat.com, david@redhat.com, rientjes@google.com,
	fvdl@google.com,  jthoughton@google.com, seanjc@google.com,
	pbonzini@redhat.com,  zhiquan1.li@intel.com, fan.du@intel.com,
	jun.miao@intel.com,  isaku.yamahata@intel.com,
	muchun.song@linux.dev, mike.kravetz@oracle.com,
	 erdemaktas@google.com, vannapurve@google.com,
	qperret@google.com,  jhubbard@nvidia.com, willy@infradead.org,
	shuah@kernel.org,  brauner@kernel.org, bfoster@redhat.com,
	kent.overstreet@linux.dev,  pvorel@suse.cz, rppt@kernel.org,
	richard.weiyang@gmail.com,  anup@brainfault.org,
	haibo1.xu@intel.com, ajones@ventanamicro.com,
	 vkuznets@redhat.com, maciej.wieczor-retman@intel.com,
	pgonda@google.com,  oliver.upton@linux.dev,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	 kvm@vger.kernel.org, linux-kselftest@vger.kernel.org,
	linux-fsdevel@kvack.org,  jgowans@amazon.com,
	kalyazin@amazon.co.uk, derekmn@amazon.com
Subject: Re: [RFC PATCH 30/39] KVM: guest_memfd: Handle folio preparation for guest_memfd mmap
Date: Tue, 08 Oct 2024 18:07:18 +0000	[thread overview]
Message-ID: <diqz1q0qtqnd.fsf@ackerleytng-ctop.c.googlers.com> (raw)
In-Reply-To: <e8f55fef-1821-408e-88ed-b25200ef66c9@amazon.co.uk> (message from Patrick Roy on Mon, 7 Oct 2024 16:56:42 +0100)

Patrick Roy <roypat@amazon.co.uk> writes:

> Hi Ackerley,
>
> On Thu, 2024-10-03 at 22:32 +0100, Ackerley Tng wrote:
>> Elliot Berman <quic_eberman@quicinc.com> writes:
>>
>>> On Tue, Sep 10, 2024 at 11:44:01PM +0000, Ackerley Tng wrote:
>>>> Since guest_memfd now supports mmap(), folios have to be prepared
>>>> before they are faulted into userspace.
>>>>
>>>> When memory attributes are switched between shared and private, the
>>>> up-to-date flags will be cleared.
>>>>
>>>> Use the folio's up-to-date flag to indicate being ready for the guest
>>>> usage and can be used to mark whether the folio is ready for shared OR
>>>> private use.
>>>
>>> Clearing the up-to-date flag also means that the page gets zero'd out
>>> whenever it transitions between shared and private (either direction).
>>> pKVM (Android) hypervisor policy can allow in-place conversion between
>>> shared/private.
>>>
>>> I believe the important thing is that sev_gmem_prepare() needs to be
>>> called prior to giving page to guest. In my series, I had made a
>>> ->prepare_inaccessible() callback where KVM would only do this part.
>>> When transitioning to inaccessible, only that callback would be made,
>>> besides the bookkeeping. The folio zeroing happens once when allocating
>>> the folio if the folio is initially accessible (faultable).
>>>
>>> From x86 CoCo perspective, I think it also makes sense to not zero
>>> the folio when changing faultiblity from private to shared:
>>>  - If guest is sharing some data with host, you've wiped the data and
>>>    guest has to copy again.
>>>  - Or, if SEV/TDX enforces that page is zero'd between transitions,
>>>    Linux has duplicated the work that trusted entity has already done.
>>>
>>> Fuad and I can help add some details for the conversion. Hopefully we
>>> can figure out some of the plan at plumbers this week.
>>
>> Zeroing the page prevents leaking host data (see function docstring for
>> kvm_gmem_prepare_folio() introduced in [1]), so we definitely don't want
>> to introduce a kernel data leak bug here.
>>
>> In-place conversion does require preservation of data, so for
>> conversions, shall we zero depending on VM type?
>>
>> + Gunyah: don't zero since ->prepare_inaccessible() is a no-op
>> + pKVM: don't zero
>> + TDX: don't zero
>> + SEV: AMD Architecture Programmers Manual 7.10.6 says there is no
>>   automatic encryption and implies no zeroing, hence perform zeroing
>> + KVM_X86_SW_PROTECTED_VM: Doesn't have a formal definition so I guess
>>   we could require zeroing on transition?
>
> Maybe for KVM_X86_SW_PROTECTED_VM we could make zero-ing configurable
> via some CREATE_GUEST_MEMFD flag, instead of forcing one specific
> behavior.

Sounds good to me, I can set up a flag in the next revision.

> For the "non-CoCo with direct map entries removed" VMs that we at AWS
> are going for, we'd like a VM type with host-controlled in-place
> conversions which doesn't zero on transitions, so if
> KVM_X86_SW_PROTECTED_VM ends up zeroing, we'd need to add another new VM
> type for that.
>
> Somewhat related sidenote: For VMs that allow inplace conversions and do
> not zero, we do not need to zap the stage-2 mappings on memory attribute
> changes, right?
>

Here are some reasons for zapping I can think of:

1. When private pages are split/merged, zapping the stage-2 mappings on
   memory attribute changes allows the private pages to be re-faulted by
   KVM at smaller/larger granularity.

2. The rationale described here
   https://elixir.bootlin.com/linux/v6.11.2/source/arch/x86/kvm/mmu/mmu.c#L7482
   ("Zapping SPTEs in this case ensures KVM will reassess whether or not
   a hugepage can be used for affected ranges.") probably refers to the
   existing implementation, when a different set of physical pages is
   used to back shared and private memory. When the same set of physical
   pages is used for both shared and private memory, then IIUC this
   rationale does not apply.

3. There's another rationale for zapping
   https://elixir.bootlin.com/linux/v6.11.2/source/virt/kvm/kvm_main.c#L2494
   to do with read vs write mappings here. I don't fully understand
   this, does this rationale still apply?

4. Is zapping required if the pages get removed/added to kernel direct
   map?

>> This way, the uptodate flag means that it has been prepared (as in
>> sev_gmem_prepare()), and zeroed if required by VM type.
>>
>> Regarding flushing the dcache/tlb in your other question [2], if we
>> don't use folio_zero_user(), can we relying on unmapping within core-mm
>> to flush after shared use, and unmapping within KVM To flush after
>> private use?
>>
>> Or should flush_dcache_folio() be explicitly called on kvm_gmem_fault()?
>>
>> clear_highpage(), used in the non-hugetlb (original) path, doesn't flush
>> the dcache. Was that intended?
>>
>>> Thanks,
>>> Elliot
>>>
>>>>
>>>> <snip>
>>
>> [1] https://lore.kernel.org/all/20240726185157.72821-8-pbonzini@redhat.com/
>> [2] https://lore.kernel.org/all/diqz34ldszp3.fsf@ackerleytng-ctop.c.googlers.com/
>
> Best,
> Patrick

next prev parent reply	other threads:[~2024-10-08 18:07 UTC|newest]

Thread overview: 130+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-10 23:43 [RFC PATCH 00/39] 1G page support for guest_memfd Ackerley Tng
2024-09-10 23:43 ` [RFC PATCH 01/39] mm: hugetlb: Simplify logic in dequeue_hugetlb_folio_vma() Ackerley Tng
2024-09-10 23:43 ` [RFC PATCH 02/39] mm: hugetlb: Refactor vma_has_reserves() to should_use_hstate_resv() Ackerley Tng
2024-09-10 23:43 ` [RFC PATCH 03/39] mm: hugetlb: Remove unnecessary check for avoid_reserve Ackerley Tng
2024-09-10 23:43 ` [RFC PATCH 04/39] mm: mempolicy: Refactor out policy_node_nodemask() Ackerley Tng
2024-09-11 16:46   ` Gregory Price
2024-09-10 23:43 ` [RFC PATCH 05/39] mm: hugetlb: Refactor alloc_buddy_hugetlb_folio_with_mpol() to interpret mempolicy instead of vma Ackerley Tng
2024-09-10 23:43 ` [RFC PATCH 06/39] mm: hugetlb: Refactor dequeue_hugetlb_folio_vma() to use mpol Ackerley Tng
2024-09-10 23:43 ` [RFC PATCH 07/39] mm: hugetlb: Refactor out hugetlb_alloc_folio Ackerley Tng
2024-09-10 23:43 ` [RFC PATCH 08/39] mm: truncate: Expose preparation steps for truncate_inode_pages_final Ackerley Tng
2024-09-10 23:43 ` [RFC PATCH 09/39] mm: hugetlb: Expose hugetlb_subpool_{get,put}_pages() Ackerley Tng
2024-09-10 23:43 ` [RFC PATCH 10/39] mm: hugetlb: Add option to create new subpool without using surplus Ackerley Tng
2024-09-10 23:43 ` [RFC PATCH 11/39] mm: hugetlb: Expose hugetlb_acct_memory() Ackerley Tng
2024-09-10 23:43 ` [RFC PATCH 12/39] mm: hugetlb: Move and expose hugetlb_zero_partial_page() Ackerley Tng
2024-09-10 23:43 ` [RFC PATCH 13/39] KVM: guest_memfd: Make guest mem use guest mem inodes instead of anonymous inodes Ackerley Tng
2025-04-02  4:01   ` Yan Zhao
2025-04-23 20:22     ` Ackerley Tng
2025-04-24  3:53       ` Yan Zhao
2024-09-10 23:43 ` [RFC PATCH 14/39] KVM: guest_memfd: hugetlb: initialization and cleanup Ackerley Tng
2024-09-20  9:17   ` Vishal Annapurve
2024-10-01 23:00     ` Ackerley Tng
2024-12-01 17:59   ` Peter Xu
2025-02-13  9:47     ` Ackerley Tng
2025-02-26 18:55       ` Ackerley Tng
2025-03-06 17:33   ` Peter Xu
2024-09-10 23:43 ` [RFC PATCH 15/39] KVM: guest_memfd: hugetlb: allocate and truncate from hugetlb Ackerley Tng
2024-09-13 22:26   ` Elliot Berman
2024-10-03 20:23     ` Ackerley Tng
2024-10-30  9:01   ` Jun Miao
2025-02-11  1:21     ` Ackerley Tng
2024-12-01 17:55   ` Peter Xu
2025-02-13  7:52     ` Ackerley Tng
2025-02-13 16:48       ` Peter Xu
2024-09-10 23:43 ` [RFC PATCH 16/39] KVM: guest_memfd: Add page alignment check for hugetlb guest_memfd Ackerley Tng
2024-09-10 23:43 ` [RFC PATCH 17/39] KVM: selftests: Add basic selftests for hugetlb-backed guest_memfd Ackerley Tng
2024-09-10 23:43 ` [RFC PATCH 18/39] KVM: selftests: Support various types of backing sources for private memory Ackerley Tng
2024-09-10 23:43 ` [RFC PATCH 19/39] KVM: selftests: Update test for various private memory backing source types Ackerley Tng
2024-09-10 23:43 ` [RFC PATCH 20/39] KVM: selftests: Add private_mem_conversions_test.sh Ackerley Tng
2024-09-10 23:43 ` [RFC PATCH 21/39] KVM: selftests: Test that guest_memfd usage is reported via hugetlb Ackerley Tng
2024-09-10 23:43 ` [RFC PATCH 22/39] mm: hugetlb: Expose vmemmap optimization functions Ackerley Tng
2024-09-10 23:43 ` [RFC PATCH 23/39] mm: hugetlb: Expose HugeTLB functions for promoting/demoting pages Ackerley Tng
2024-09-10 23:43 ` [RFC PATCH 24/39] mm: hugetlb: Add functions to add/move/remove from hugetlb lists Ackerley Tng
2024-09-10 23:43 ` [RFC PATCH 25/39] KVM: guest_memfd: Split HugeTLB pages for guest_memfd use Ackerley Tng
2024-09-10 23:43 ` [RFC PATCH 26/39] KVM: guest_memfd: Track faultability within a struct kvm_gmem_private Ackerley Tng
2024-10-10 16:06   ` Peter Xu
2024-10-11 23:32     ` Ackerley Tng
2024-10-15 21:34       ` Peter Xu
2024-10-15 23:42         ` Ackerley Tng
2024-10-16  8:45           ` David Hildenbrand
2024-10-16 20:16             ` Peter Xu
2024-10-16 22:51               ` Jason Gunthorpe
2024-10-16 23:49                 ` Peter Xu
2024-10-16 23:54                   ` Jason Gunthorpe
2024-10-17 14:58                     ` Peter Xu
2024-10-17 16:47                       ` Jason Gunthorpe
2024-10-17 17:05                         ` Peter Xu
2024-10-17 17:10                           ` Jason Gunthorpe
2024-10-17 19:11                             ` Peter Xu
2024-10-17 19:18                               ` Jason Gunthorpe
2024-10-17 19:29                                 ` David Hildenbrand
2024-10-18  7:15                                 ` Patrick Roy
2024-10-18  7:50                                   ` David Hildenbrand
2024-10-18  9:34                                     ` Patrick Roy
2024-10-17 17:11                         ` David Hildenbrand
2024-10-17 17:16                           ` Jason Gunthorpe
2024-10-17 17:55                             ` David Hildenbrand
2024-10-17 18:26                             ` Vishal Annapurve
2024-10-17 14:56                   ` David Hildenbrand
2024-10-17 15:02               ` David Hildenbrand
2024-10-16  8:50           ` David Hildenbrand
2024-10-16 10:48             ` Vishal Annapurve
2024-10-16 11:54               ` David Hildenbrand
2024-10-16 11:57                 ` Jason Gunthorpe
2025-02-25 20:37   ` Peter Xu
2025-04-23 22:07     ` Ackerley Tng
2024-09-10 23:43 ` [RFC PATCH 27/39] KVM: guest_memfd: Allow mmapping guest_memfd files Ackerley Tng
2025-01-20 22:42   ` Peter Xu
2025-04-23 20:25     ` Ackerley Tng
2025-03-04 23:24   ` Peter Xu
2025-04-02  4:07   ` Yan Zhao
2025-04-23 20:28     ` Ackerley Tng
2024-09-10 23:43 ` [RFC PATCH 28/39] KVM: guest_memfd: Use vm_type to determine default faultability Ackerley Tng
2024-09-10 23:44 ` [RFC PATCH 29/39] KVM: Handle conversions in the SET_MEMORY_ATTRIBUTES ioctl Ackerley Tng
2024-09-10 23:44 ` [RFC PATCH 30/39] KVM: guest_memfd: Handle folio preparation for guest_memfd mmap Ackerley Tng
2024-09-16 20:00   ` Elliot Berman
2024-10-03 21:32     ` Ackerley Tng
2024-10-03 23:43       ` Ackerley Tng
2024-10-08 19:30         ` Sean Christopherson
2024-10-07 15:56       ` Patrick Roy
2024-10-08 18:07         ` Ackerley Tng [this message]
2024-10-08 19:56           ` Sean Christopherson
2024-10-09  3:51             ` Manwaring, Derek
2024-10-09 13:52               ` Andrew Cooper
2024-10-10 16:21             ` Patrick Roy
2024-10-10 19:27               ` Manwaring, Derek
2024-10-17 23:16               ` Ackerley Tng
2024-10-18  7:10                 ` Patrick Roy
2024-09-10 23:44 ` [RFC PATCH 31/39] KVM: selftests: Allow vm_set_memory_attributes to be used without asserting return value of 0 Ackerley Tng
2024-09-10 23:44 ` [RFC PATCH 32/39] KVM: selftests: Test using guest_memfd memory from userspace Ackerley Tng
2024-09-10 23:44 ` [RFC PATCH 33/39] KVM: selftests: Test guest_memfd memory sharing between guest and host Ackerley Tng
2024-09-10 23:44 ` [RFC PATCH 34/39] KVM: selftests: Add notes in private_mem_kvm_exits_test for mmap-able guest_memfd Ackerley Tng
2024-09-10 23:44 ` [RFC PATCH 35/39] KVM: selftests: Test that pinned pages block KVM from setting memory attributes to PRIVATE Ackerley Tng
2024-09-10 23:44 ` [RFC PATCH 36/39] KVM: selftests: Refactor vm_mem_add to be more flexible Ackerley Tng
2024-09-10 23:44 ` [RFC PATCH 37/39] KVM: selftests: Add helper to perform madvise by memslots Ackerley Tng
2024-09-10 23:44 ` [RFC PATCH 38/39] KVM: selftests: Update private_mem_conversions_test for mmap()able guest_memfd Ackerley Tng
2024-09-10 23:44 ` [RFC PATCH 39/39] KVM: guest_memfd: Dynamically split/reconstruct HugeTLB page Ackerley Tng
2025-04-03 12:33   ` Yan Zhao
2025-04-23 22:02     ` Ackerley Tng
2025-04-24  1:09       ` Yan Zhao
2025-04-24  4:25         ` Yan Zhao
2025-04-24  5:55           ` Chenyi Qiang
2025-04-24  8:13             ` Yan Zhao
2025-04-24 14:10               ` Vishal Annapurve
2025-04-24 18:15                 ` Ackerley Tng
2025-04-25  4:02                   ` Yan Zhao
2025-04-25 22:45                     ` Ackerley Tng
2025-04-28  1:05                       ` Yan Zhao
2025-04-28 19:02                         ` Vishal Annapurve
2025-04-30 20:09                         ` Ackerley Tng
2025-05-06  1:23                           ` Yan Zhao
2025-05-06 19:22                             ` Ackerley Tng
2025-05-07  3:15                               ` Yan Zhao
2025-05-13 17:33                                 ` Ackerley Tng
2024-09-11  6:56 ` [RFC PATCH 00/39] 1G page support for guest_memfd Michal Hocko
2024-09-14  1:08 ` Du, Fan
2024-09-14 13:34   ` Vishal Annapurve
2025-01-28  9:42 ` Amit Shah
2025-02-03  8:35   ` Ackerley Tng
2025-02-06 11:07     ` Amit Shah
2025-02-07  6:25       ` Ackerley Tng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=diqz1q0qtqnd.fsf@ackerleytng-ctop.c.googlers.com \
    --to=ackerleytng@google.com \
    --cc=ajones@ventanamicro.com \
    --cc=anup@brainfault.org \
    --cc=bfoster@redhat.com \
    --cc=brauner@kernel.org \
    --cc=david@redhat.com \
    --cc=derekmn@amazon.com \
    --cc=erdemaktas@google.com \
    --cc=fan.du@intel.com \
    --cc=fvdl@google.com \
    --cc=haibo1.xu@intel.com \
    --cc=isaku.yamahata@intel.com \
    --cc=jgg@nvidia.com \
    --cc=jgowans@amazon.com \
    --cc=jhubbard@nvidia.com \
    --cc=jthoughton@google.com \
    --cc=jun.miao@intel.com \
    --cc=kalyazin@amazon.co.uk \
    --cc=kent.overstreet@linux.dev \
    --cc=kvm@vger.kernel.org \
    --cc=linux-fsdevel@kvack.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=maciej.wieczor-retman@intel.com \
    --cc=mike.kravetz@oracle.com \
    --cc=muchun.song@linux.dev \
    --cc=oliver.upton@linux.dev \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=pgonda@google.com \
    --cc=pvorel@suse.cz \
    --cc=qperret@google.com \
    --cc=quic_eberman@quicinc.com \
    --cc=richard.weiyang@gmail.com \
    --cc=rientjes@google.com \
    --cc=roypat@amazon.co.uk \
    --cc=rppt@kernel.org \
    --cc=seanjc@google.com \
    --cc=shuah@kernel.org \
    --cc=tabba@google.com \
    --cc=vannapurve@google.com \
    --cc=vkuznets@redhat.com \
    --cc=willy@infradead.org \
    --cc=zhiquan1.li@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox