From: David Hildenbrand <david@redhat.com>
To: Patrick Roy <roypat@amazon.co.uk>,
seanjc@google.com, Fuad Tabba <tabba@google.com>
Cc: pbonzini@redhat.com, akpm@linux-foundation.org,
dwmw@amazon.co.uk, rppt@kernel.org, tglx@linutronix.de,
mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
x86@kernel.org, hpa@zytor.com, willy@infradead.org,
graf@amazon.com, derekmn@amazon.com, kalyazin@amazon.com,
kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-mm@kvack.org, dmatlack@google.com,
chao.p.peng@linux.intel.com, xmarcalx@amazon.co.uk,
James Gowans <jgowans@amazon.com>
Subject: Re: [RFC PATCH 8/8] kvm: gmem: Allow restricted userspace mappings
Date: Tue, 30 Jul 2024 12:15:56 +0200 [thread overview]
Message-ID: <ab528aa0-d4a5-4661-9715-43eb1681cfef@redhat.com> (raw)
In-Reply-To: <e26ec0bb-3c20-4732-a09b-83b6b6a6419a@amazon.co.uk>
>> Hi,
>>
>> sorry for the late reply. Yes, you could have joined .... too late.
>
> No worries, I did end up joining to listen in to y'all's discussion
> anyway :)
Sorry for the late reply :(
>
>> There will be a summary posted soon. So far the agreement is that we're
>> planning on allowing shared memory as part guest_memfd, and will allow
>> that to get mapped and pinned. Private memory is not going to get mapped
>> and pinned.
>>
>> If we have to disallow pinning of shared memory on top for some use
>> cases (i.e., no directmap), I assume that could be added.
>>
>>>
>>>> Note that just from staring at this commit, I don't understand the
>>>> motivation *why* we would want to do that.
>>>
>>> Fair - I admittedly didn't get into that as much as I probably should
>>> have. In our usecase, we do not have anything that pKVM would (I think)
>>> call "guest-private" memory. I think our memory can be better described
>>> as guest-owned, but always shared with the VMM (e.g. userspace), but
>>> ideally never shared with the host kernel. This model lets us do a lot
>>> of simplifying assumptions: Things like I/O can be handled in userspace
>>> without the guest explicitly sharing I/O buffers (which is not exactly
>>> what we would want long-term anyway, as sharing in the guest_memfd
>>> context means sharing with the host kernel), we can easily do VM
>>> snapshotting without needing things like TDX's TDH.EXPORT.MEM APIs, etc.
>>
>> Okay, so essentially you would want to use guest_memfd to only contain
>> shard memory and disallow any pinning like for secretmem.
>
> Yeah, this is pretty much what I thought we wanted before listening in
> on Wednesday.
>
> I've actually be thinking about this some more since then though. With
> hugepages, if the VM is backed by, say, 2M pages, our on-demand direct
> map insertion approach runs into the same problem that CoCo VMs have
> when they're backed by hugepages: How to deal with the guest only
> sharing a 4K range in a hugepage? If we want to restore the direct map
> for e.g. the page containing kvm-clock data, then we can't simply go
> ahead and restore the direct map for the entire 2M page, because there
> very well might be stuff in the other 511 small guest pages that we
> really do not want in the direct map. And we can't even take the
Right, you'd only want to restore the direct map for a fragment. Or
dynamically map that fragment using kmap where required (as raised by
Vlastimil).
> approach of letting the guest deal with the problem, because here
> "sharing" is driven by the host, not the guest, so the guest cannot
> possibly know that it maybe should avoid putting stuff it doesn't want
> shared into those remaining 511 pages! To me that sounds a lot like the
> whole "breaking down huge folios to allow GUP to only some parts of it"
> thing mentioned on Wednesday.
Yes. While it would be one logical huge page, it would be exposed to the
remainder of the kernel as 512 individual pages.
>
> Now, if we instead treat "guest memory without direct map entries" as
> "private", and "guest memory with direct map entries" as "shared", then
> the above will be solved by whatever mechanism allows gupping/mapping of
> only the "shared" parts of huge folios, IIUC. The fact that GUP is then
> also allowed for the "shared" parts is not actually a problem for us -
> we went down the route of disabling GUP altogether here because based on
> [1] it sounded like GUP for anything gmem related would never happen.
Right. Might there also be a case for removing the directmap for shared
memory or is that not really a requirement so far?
> But after something is re-inserted into the direct map, we don't very
> much care if it can be GUP-ed or not. In fact, allowing GUP for the
> shared parts probably makes some things easier for us, as we can then do
> I/O without bounce buffers by just in-place converting I/O-buffers to
> shared, and then treating that shared slice of guest_memfd the same way
> we treat traditional guest memory today.
Yes.
> In a very far-off future, we'd
> like to be able to do I/O without ever reinserting pages into the direct
> map, but I don't think adopting this private/shared model for gmem would
> block us from doing that?
How would that I/O get triggered? GUP would require the directmap.
>
> Although all of this does hinge on us being able to do the in-place
> shared/private conversion without any guest involvement. Do you envision
> that to be possible?
Who would trigger the conversion and how? I don't see a reason why --
for your use case -- user space shouldn't be able to trigger conversion
private <-> shared. At least nothing fundamental comes to mind that
would prohibit that.
--
Cheers,
David / dhildenb
next prev parent reply other threads:[~2024-07-30 10:16 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-09 13:20 [RFC PATCH 0/8] Unmapping guest_memfd from Direct Map Patrick Roy
2024-07-09 13:20 ` [RFC PATCH 1/8] kvm: Allow reading/writing gmem using kvm_{read,write}_guest Patrick Roy
2024-07-09 13:20 ` [RFC PATCH 2/8] kvm: use slowpath in gfn_to_hva_cache if memory is private Patrick Roy
2024-07-09 13:20 ` [RFC PATCH 3/8] kvm: pfncache: enlighten about gmem Patrick Roy
2024-07-09 14:36 ` David Woodhouse
2024-07-10 9:49 ` Patrick Roy
2024-07-10 10:20 ` David Woodhouse
2024-07-10 10:46 ` Patrick Roy
2024-07-10 10:50 ` David Woodhouse
2024-07-09 13:20 ` [RFC PATCH 4/8] kvm: x86: support walking guest page tables in gmem Patrick Roy
2024-07-09 13:20 ` [RFC PATCH 5/8] kvm: gmem: add option to remove guest private memory from direct map Patrick Roy
2024-07-10 7:31 ` Mike Rapoport
2024-07-10 9:50 ` Patrick Roy
2024-07-09 13:20 ` [RFC PATCH 6/8] kvm: gmem: Temporarily restore direct map entries when needed Patrick Roy
2024-07-11 6:25 ` Paolo Bonzini
2024-07-09 13:20 ` [RFC PATCH 7/8] mm: secretmem: use AS_INACCESSIBLE to prohibit GUP Patrick Roy
2024-07-09 21:09 ` David Hildenbrand
2024-07-10 7:32 ` Mike Rapoport
2024-07-10 9:50 ` Patrick Roy
2024-07-10 21:14 ` David Hildenbrand
2024-07-09 13:20 ` [RFC PATCH 8/8] kvm: gmem: Allow restricted userspace mappings Patrick Roy
2024-07-09 14:48 ` Fuad Tabba
2024-07-09 21:13 ` David Hildenbrand
2024-07-10 9:51 ` Patrick Roy
2024-07-10 21:12 ` David Hildenbrand
2024-07-10 21:53 ` Sean Christopherson
2024-07-10 21:56 ` David Hildenbrand
2024-07-12 15:59 ` Patrick Roy
2024-07-30 10:15 ` David Hildenbrand [this message]
2024-08-01 10:30 ` Patrick Roy
2024-07-22 12:28 ` [RFC PATCH 0/8] Unmapping guest_memfd from Direct Map Vlastimil Babka (SUSE)
2024-07-26 6:55 ` Patrick Roy
2024-07-30 10:17 ` David Hildenbrand
2024-07-26 16:44 ` Yosry Ahmed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ab528aa0-d4a5-4661-9715-43eb1681cfef@redhat.com \
--to=david@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=bp@alien8.de \
--cc=chao.p.peng@linux.intel.com \
--cc=dave.hansen@linux.intel.com \
--cc=derekmn@amazon.com \
--cc=dmatlack@google.com \
--cc=dwmw@amazon.co.uk \
--cc=graf@amazon.com \
--cc=hpa@zytor.com \
--cc=jgowans@amazon.com \
--cc=kalyazin@amazon.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mingo@redhat.com \
--cc=pbonzini@redhat.com \
--cc=roypat@amazon.co.uk \
--cc=rppt@kernel.org \
--cc=seanjc@google.com \
--cc=tabba@google.com \
--cc=tglx@linutronix.de \
--cc=willy@infradead.org \
--cc=x86@kernel.org \
--cc=xmarcalx@amazon.co.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox