linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Patrick Roy <roypat@amazon.co.uk>
To: David Hildenbrand <david@redhat.com>, <seanjc@google.com>,
	Fuad Tabba <tabba@google.com>
Cc: <pbonzini@redhat.com>, <akpm@linux-foundation.org>,
	<dwmw@amazon.co.uk>, <rppt@kernel.org>, <tglx@linutronix.de>,
	<mingo@redhat.com>, <bp@alien8.de>, <dave.hansen@linux.intel.com>,
	<x86@kernel.org>, <hpa@zytor.com>, <willy@infradead.org>,
	<graf@amazon.com>, <derekmn@amazon.com>, <kalyazin@amazon.com>,
	<kvm@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
	<linux-mm@kvack.org>, <dmatlack@google.com>,
	<chao.p.peng@linux.intel.com>, <xmarcalx@amazon.co.uk>,
	James Gowans <jgowans@amazon.com>
Subject: Re: [RFC PATCH 8/8] kvm: gmem: Allow restricted userspace mappings
Date: Thu, 1 Aug 2024 11:30:21 +0100	[thread overview]
Message-ID: <e8663138-b75b-472d-8dcc-589b2ef91e53@amazon.co.uk> (raw)
In-Reply-To: <ab528aa0-d4a5-4661-9715-43eb1681cfef@redhat.com>

On Tue, 2024-07-30 at 11:15 +0100, David Hildenbrand wrote:
>>> Hi,
>>>
>>> sorry for the late reply. Yes, you could have joined .... too late.
>>
>> No worries, I did end up joining to listen in to y'all's discussion
>> anyway :)
> 
> Sorry for the late reply :(

No worries :)

>>
>>> There will be a summary posted soon. So far the agreement is that we're
>>> planning on allowing shared memory as part guest_memfd, and will allow
>>> that to get mapped and pinned. Private memory is not going to get mapped
>>> and pinned.
>>>
>>> If we have to disallow pinning of shared memory on top for some use
>>> cases (i.e., no directmap), I assume that could be added.
>>>
>>>>
>>>>> Note that just from staring at this commit, I don't understand the
>>>>> motivation *why* we would want to do that.
>>>>
>>>> Fair - I admittedly didn't get into that as much as I probably should
>>>> have. In our usecase, we do not have anything that pKVM would (I think)
>>>> call "guest-private" memory. I think our memory can be better described
>>>> as guest-owned, but always shared with the VMM (e.g. userspace), but
>>>> ideally never shared with the host kernel. This model lets us do a lot
>>>> of simplifying assumptions: Things like I/O can be handled in userspace
>>>> without the guest explicitly sharing I/O buffers (which is not exactly
>>>> what we would want long-term anyway, as sharing in the guest_memfd
>>>> context means sharing with the host kernel), we can easily do VM
>>>> snapshotting without needing things like TDX's TDH.EXPORT.MEM APIs, etc.
>>>
>>> Okay, so essentially you would want to use guest_memfd to only contain
>>> shard memory and disallow any pinning like for secretmem.
>>
>> Yeah, this is pretty much what I thought we wanted before listening in
>> on Wednesday.
>>
>> I've actually be thinking about this some more since then though. With
>> hugepages, if the VM is backed by, say, 2M pages, our on-demand direct
>> map insertion approach runs into the same problem that CoCo VMs have
>> when they're backed by hugepages: How to deal with the guest only
>> sharing a 4K range in a hugepage? If we want to restore the direct map
>> for e.g. the page containing kvm-clock data, then we can't simply go
>> ahead and restore the direct map for the entire 2M page, because there
>> very well might be stuff in the other 511 small guest pages that we
>> really do not want in the direct map. And we can't even take the
> 
> Right, you'd only want to restore the direct map for a fragment. Or
> dynamically map that fragment using kmap where required (as raised by
> Vlastimil).

Can the kmap approach work if the memory is supposed to be GUP-able?

>> approach of letting the guest deal with the problem, because here
>> "sharing" is driven by the host, not the guest, so the guest cannot
>> possibly know that it maybe should avoid putting stuff it doesn't want
>> shared into those remaining 511 pages! To me that sounds a lot like the
>> whole "breaking down huge folios to allow GUP to only some parts of it"
>> thing mentioned on Wednesday.
> 
> Yes. While it would be one logical huge page, it would be exposed to the
> remainder of the kernel as 512 individual pages.
> 
>>
>> Now, if we instead treat "guest memory without direct map entries" as
>> "private", and "guest memory with direct map entries" as "shared", then
>> the above will be solved by whatever mechanism allows gupping/mapping of
>> only the "shared" parts of huge folios, IIUC. The fact that GUP is then
>> also allowed for the "shared" parts is not actually a problem for us -
>> we went down the route of disabling GUP altogether here because based on
>> [1] it sounded like GUP for anything gmem related would never happen.
> 
> Right. Might there also be a case for removing the directmap for shared
> memory or is that not really a requirement so far?

No, not really - we would only mark as "shared" memory that _needs_ to
be in the direct map for functional reasons (e.g. MMIO instruction
emulation, etc.).

>> But after something is re-inserted into the direct map, we don't very
>> much care if it can be GUP-ed or not. In fact, allowing GUP for the
>> shared parts probably makes some things easier for us, as we can then do
>> I/O without bounce buffers by just in-place converting I/O-buffers to
>> shared, and then treating that shared slice of guest_memfd the same way
>> we treat traditional guest memory today.
> 
> Yes.
> 
>> In a very far-off future, we'd
>> like to be able to do I/O without ever reinserting pages into the direct
>> map, but I don't think adopting this private/shared model for gmem would
>> block us from doing that?
> 
> How would that I/O get triggered? GUP would require the directmap.

I was hoping that this "phyr" thing Matthew has been talking about [1]
would allow somehow doing I/O without direct map entries/GUP, but maybe
I am misunderstanding something.

>>
>> Although all of this does hinge on us being able to do the in-place
>> shared/private conversion without any guest involvement. Do you envision
>> that to be possible?
> 
> Who would trigger the conversion and how? I don't see a reason why --
> for your use case -- user space shouldn't be able to trigger conversion
> private <-> shared. At least nothing fundamental comes to mind that
> would prohibit that.

Either KVM itself would trigger the conversions whenever it wants to
access gmem (e.g. each place in this series where there is a
set_direct_map_{invalid,default} it would do a shared/private
conversion), or userspace would do it via some syscall/ioctl (the one
place I can think of right now is I/O, where the VMM receives a virtio
buffer from the guest and converts it from private to shared in-place.
Although I guess 2 syscalls for each I/O operation aren't great
perf-wise, so maybe swiotlb still wins out here?).

I actually see that Fuad just posted an RFC series that implements the
basic shared/private handling [2], so will probably also comment about
this over there after I had a closer look :)

> -- 
> Cheers,
> 
> David / dhildenb

Best, 
Patrick

[1]: https://lore.kernel.org/netdev/Yd0IeK5s%2FE0fuWqn@casper.infradead.org/T/
[2]: https://lore.kernel.org/kvm/20240801090117.3841080-1-tabba@google.com/T/#t


  reply	other threads:[~2024-08-01 10:30 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-09 13:20 [RFC PATCH 0/8] Unmapping guest_memfd from Direct Map Patrick Roy
2024-07-09 13:20 ` [RFC PATCH 1/8] kvm: Allow reading/writing gmem using kvm_{read,write}_guest Patrick Roy
2024-07-09 13:20 ` [RFC PATCH 2/8] kvm: use slowpath in gfn_to_hva_cache if memory is private Patrick Roy
2024-07-09 13:20 ` [RFC PATCH 3/8] kvm: pfncache: enlighten about gmem Patrick Roy
2024-07-09 14:36   ` David Woodhouse
2024-07-10  9:49     ` Patrick Roy
2024-07-10 10:20       ` David Woodhouse
2024-07-10 10:46         ` Patrick Roy
2024-07-10 10:50           ` David Woodhouse
2024-07-09 13:20 ` [RFC PATCH 4/8] kvm: x86: support walking guest page tables in gmem Patrick Roy
2024-07-09 13:20 ` [RFC PATCH 5/8] kvm: gmem: add option to remove guest private memory from direct map Patrick Roy
2024-07-10  7:31   ` Mike Rapoport
2024-07-10  9:50     ` Patrick Roy
2024-07-09 13:20 ` [RFC PATCH 6/8] kvm: gmem: Temporarily restore direct map entries when needed Patrick Roy
2024-07-11  6:25   ` Paolo Bonzini
2024-07-09 13:20 ` [RFC PATCH 7/8] mm: secretmem: use AS_INACCESSIBLE to prohibit GUP Patrick Roy
2024-07-09 21:09   ` David Hildenbrand
2024-07-10  7:32     ` Mike Rapoport
2024-07-10  9:50       ` Patrick Roy
2024-07-10 21:14         ` David Hildenbrand
2024-07-09 13:20 ` [RFC PATCH 8/8] kvm: gmem: Allow restricted userspace mappings Patrick Roy
2024-07-09 14:48   ` Fuad Tabba
2024-07-09 21:13     ` David Hildenbrand
2024-07-10  9:51       ` Patrick Roy
2024-07-10 21:12         ` David Hildenbrand
2024-07-10 21:53           ` Sean Christopherson
2024-07-10 21:56             ` David Hildenbrand
2024-07-12 15:59           ` Patrick Roy
2024-07-30 10:15             ` David Hildenbrand
2024-08-01 10:30               ` Patrick Roy [this message]
2024-07-22 12:28 ` [RFC PATCH 0/8] Unmapping guest_memfd from Direct Map Vlastimil Babka (SUSE)
2024-07-26  6:55   ` Patrick Roy
2024-07-30 10:17     ` David Hildenbrand
2024-07-26 16:44 ` Yosry Ahmed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e8663138-b75b-472d-8dcc-589b2ef91e53@amazon.co.uk \
    --to=roypat@amazon.co.uk \
    --cc=akpm@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=chao.p.peng@linux.intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=derekmn@amazon.com \
    --cc=dmatlack@google.com \
    --cc=dwmw@amazon.co.uk \
    --cc=graf@amazon.com \
    --cc=hpa@zytor.com \
    --cc=jgowans@amazon.com \
    --cc=kalyazin@amazon.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=rppt@kernel.org \
    --cc=seanjc@google.com \
    --cc=tabba@google.com \
    --cc=tglx@linutronix.de \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    --cc=xmarcalx@amazon.co.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox