linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Patrick Roy <roypat@amazon.co.uk>
To: David Hildenbrand <david@redhat.com>, <seanjc@google.com>,
	Fuad Tabba <tabba@google.com>
Cc: <pbonzini@redhat.com>, <akpm@linux-foundation.org>,
	<dwmw@amazon.co.uk>, <rppt@kernel.org>, <tglx@linutronix.de>,
	<mingo@redhat.com>, <bp@alien8.de>, <dave.hansen@linux.intel.com>,
	<x86@kernel.org>, <hpa@zytor.com>, <willy@infradead.org>,
	<graf@amazon.com>, <derekmn@amazon.com>, <kalyazin@amazon.com>,
	<kvm@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
	<linux-mm@kvack.org>, <dmatlack@google.com>,
	<chao.p.peng@linux.intel.com>, <xmarcalx@amazon.co.uk>,
	James Gowans <jgowans@amazon.com>
Subject: Re: [RFC PATCH 8/8] kvm: gmem: Allow restricted userspace mappings
Date: Fri, 12 Jul 2024 16:59:05 +0100	[thread overview]
Message-ID: <e26ec0bb-3c20-4732-a09b-83b6b6a6419a@amazon.co.uk> (raw)
In-Reply-To: <f21d8157-a5e9-4acb-93fc-d040e9b585c8@redhat.com>

Hey,

On Wed, 2024-07-10 at 22:12 +0100, David Hildenbrand wrote:
> On 10.07.24 11:51, Patrick Roy wrote:
>>
>>
>> On 7/9/24 22:13, David Hildenbrand wrote:
>>> On 09.07.24 16:48, Fuad Tabba wrote:
>>>> Hi Patrick,
>>>>
>>>> On Tue, Jul 9, 2024 at 2:21 PM Patrick Roy <roypat@amazon.co.uk> wrote:
>>>>>
>>>>> Allow mapping guest_memfd into userspace. Since AS_INACCESSIBLE is set
>>>>> on the underlying address_space struct, no GUP of guest_memfd will be
>>>>> possible.
>>>>
>>>> This patch allows mapping guest_memfd() unconditionally. Even if it's
>>>> not guppable, there are other reasons why you wouldn't want to allow
>>>> this. Maybe a config flag to gate it? e.g.,
>>>
>>>
>>> As discussed with Jason, maybe not the direction we want to take with
>>> guest_memfd.
>>> If it's private memory, it shall not be mapped. Also not via magic
>>> config options.
>>>
>>> We'll likely discuss some of that in the meeting MM tomorrow I guess
>>> (having both shared and private memory in guest_memfd).
>>
>> Oh, nice. I'm assuming you mean this meeting:
>> https://lore.kernel.org/linux-mm/197a2f19-c71c-fbde-a62a-213dede1f4fd@google.com/T/?
>> Would it be okay if I also attend? I see it also mentions huge pages,
>> which is another thing we are interested in, actually :)
> 
> Hi,
> 
> sorry for the late reply. Yes, you could have joined .... too late.

No worries, I did end up joining to listen in to y'all's discussion
anyway :)

> There will be a summary posted soon. So far the agreement is that we're
> planning on allowing shared memory as part guest_memfd, and will allow
> that to get mapped and pinned. Private memory is not going to get mapped
> and pinned.
> 
> If we have to disallow pinning of shared memory on top for some use
> cases (i.e., no directmap), I assume that could be added.
> 
>>
>>> Note that just from staring at this commit, I don't understand the
>>> motivation *why* we would want to do that.
>>
>> Fair - I admittedly didn't get into that as much as I probably should
>> have. In our usecase, we do not have anything that pKVM would (I think)
>> call "guest-private" memory. I think our memory can be better described
>> as guest-owned, but always shared with the VMM (e.g. userspace), but
>> ideally never shared with the host kernel. This model lets us do a lot
>> of simplifying assumptions: Things like I/O can be handled in userspace
>> without the guest explicitly sharing I/O buffers (which is not exactly
>> what we would want long-term anyway, as sharing in the guest_memfd
>> context means sharing with the host kernel), we can easily do VM
>> snapshotting without needing things like TDX's TDH.EXPORT.MEM APIs, etc.
> 
> Okay, so essentially you would want to use guest_memfd to only contain
> shard memory and disallow any pinning like for secretmem.

Yeah, this is pretty much what I thought we wanted before listening in
on Wednesday.

I've actually be thinking about this some more since then though. With
hugepages, if the VM is backed by, say, 2M pages, our on-demand direct
map insertion approach runs into the same problem that CoCo VMs have
when they're backed by hugepages: How to deal with the guest only
sharing a 4K range in a hugepage? If we want to restore the direct map
for e.g. the page containing kvm-clock data, then we can't simply go
ahead and restore the direct map for the entire 2M page, because there
very well might be stuff in the other 511 small guest pages that we
really do not want in the direct map. And we can't even take the
approach of letting the guest deal with the problem, because here
"sharing" is driven by the host, not the guest, so the guest cannot
possibly know that it maybe should avoid putting stuff it doesn't want
shared into those remaining 511 pages! To me that sounds a lot like the
whole "breaking down huge folios to allow GUP to only some parts of it"
thing mentioned on Wednesday.

Now, if we instead treat "guest memory without direct map entries" as
"private", and "guest memory with direct map entries" as "shared", then
the above will be solved by whatever mechanism allows gupping/mapping of
only the "shared" parts of huge folios, IIUC. The fact that GUP is then
also allowed for the "shared" parts is not actually a problem for us -
we went down the route of disabling GUP altogether here because based on
[1] it sounded like GUP for anything gmem related would never happen.
But after something is re-inserted into the direct map, we don't very
much care if it can be GUP-ed or not. In fact, allowing GUP for the
shared parts probably makes some things easier for us, as we can then do
I/O without bounce buffers by just in-place converting I/O-buffers to
shared, and then treating that shared slice of guest_memfd the same way
we treat traditional guest memory today. In a very far-off future, we'd
like to be able to do I/O without ever reinserting pages into the direct
map, but I don't think adopting this private/shared model for gmem would
block us from doing that?

Although all of this does hinge on us being able to do the in-place
shared/private conversion without any guest involvement. Do you envision
that to be possible?

> If so, I wonder if it wouldn't be better to simply add KVM support to
> consume *real* secretmem memory? IIRC so far there was only demand to
> probably remove the directmap of private memory in guest_memfd, not of
> shared memory.
> -- 
> Cheers,
> 
> David / dhildenb

Best, 
Patrick

[1]: https://lore.kernel.org/all/ZdfoR3nCEP3HTtm1@casper.infradead.org/



  parent reply	other threads:[~2024-07-12 15:59 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-09 13:20 [RFC PATCH 0/8] Unmapping guest_memfd from Direct Map Patrick Roy
2024-07-09 13:20 ` [RFC PATCH 1/8] kvm: Allow reading/writing gmem using kvm_{read,write}_guest Patrick Roy
2024-07-09 13:20 ` [RFC PATCH 2/8] kvm: use slowpath in gfn_to_hva_cache if memory is private Patrick Roy
2024-07-09 13:20 ` [RFC PATCH 3/8] kvm: pfncache: enlighten about gmem Patrick Roy
2024-07-09 14:36   ` David Woodhouse
2024-07-10  9:49     ` Patrick Roy
2024-07-10 10:20       ` David Woodhouse
2024-07-10 10:46         ` Patrick Roy
2024-07-10 10:50           ` David Woodhouse
2024-07-09 13:20 ` [RFC PATCH 4/8] kvm: x86: support walking guest page tables in gmem Patrick Roy
2024-07-09 13:20 ` [RFC PATCH 5/8] kvm: gmem: add option to remove guest private memory from direct map Patrick Roy
2024-07-10  7:31   ` Mike Rapoport
2024-07-10  9:50     ` Patrick Roy
2024-07-09 13:20 ` [RFC PATCH 6/8] kvm: gmem: Temporarily restore direct map entries when needed Patrick Roy
2024-07-11  6:25   ` Paolo Bonzini
2024-07-09 13:20 ` [RFC PATCH 7/8] mm: secretmem: use AS_INACCESSIBLE to prohibit GUP Patrick Roy
2024-07-09 21:09   ` David Hildenbrand
2024-07-10  7:32     ` Mike Rapoport
2024-07-10  9:50       ` Patrick Roy
2024-07-10 21:14         ` David Hildenbrand
2024-07-09 13:20 ` [RFC PATCH 8/8] kvm: gmem: Allow restricted userspace mappings Patrick Roy
2024-07-09 14:48   ` Fuad Tabba
2024-07-09 21:13     ` David Hildenbrand
2024-07-10  9:51       ` Patrick Roy
2024-07-10 21:12         ` David Hildenbrand
2024-07-10 21:53           ` Sean Christopherson
2024-07-10 21:56             ` David Hildenbrand
2024-07-12 15:59           ` Patrick Roy [this message]
2024-07-30 10:15             ` David Hildenbrand
2024-08-01 10:30               ` Patrick Roy
2024-07-22 12:28 ` [RFC PATCH 0/8] Unmapping guest_memfd from Direct Map Vlastimil Babka (SUSE)
2024-07-26  6:55   ` Patrick Roy
2024-07-30 10:17     ` David Hildenbrand
2024-07-26 16:44 ` Yosry Ahmed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e26ec0bb-3c20-4732-a09b-83b6b6a6419a@amazon.co.uk \
    --to=roypat@amazon.co.uk \
    --cc=akpm@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=chao.p.peng@linux.intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=derekmn@amazon.com \
    --cc=dmatlack@google.com \
    --cc=dwmw@amazon.co.uk \
    --cc=graf@amazon.com \
    --cc=hpa@zytor.com \
    --cc=jgowans@amazon.com \
    --cc=kalyazin@amazon.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=rppt@kernel.org \
    --cc=seanjc@google.com \
    --cc=tabba@google.com \
    --cc=tglx@linutronix.de \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    --cc=xmarcalx@amazon.co.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox