From: Yan Zhao <yan.y.zhao@intel.com>
To: Sean Christopherson <seanjc@google.com>
Cc: Chao Peng <chao.p.peng@linux.intel.com>,
kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
qemu-devel@nongnu.org, Paolo Bonzini <pbonzini@redhat.com>,
Jonathan Corbet <corbet@lwn.net>,
Vitaly Kuznetsov <vkuznets@redhat.com>,
Wanpeng Li <wanpengli@tencent.com>,
Jim Mattson <jmattson@google.com>, Joerg Roedel <joro@8bytes.org>,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
x86@kernel.org, "H . Peter Anvin" <hpa@zytor.com>,
Hugh Dickins <hughd@google.com>, Jeff Layton <jlayton@kernel.org>,
"J . Bruce Fields" <bfields@fieldses.org>,
Andrew Morton <akpm@linux-foundation.org>,
Yu Zhang <yu.c.zhang@linux.intel.com>,
"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
luto@kernel.org, john.ji@intel.com, susie.li@intel.com,
jun.nakajima@intel.com, dave.hansen@intel.com,
ak@linux.intel.com, david@redhat.com
Subject: Re: [PATCH v3 kvm/queue 14/16] KVM: Handle page fault for private memory
Date: Fri, 14 Jan 2022 13:53:15 +0800 [thread overview]
Message-ID: <20220114055315.GA29165@yzhao56-desk.sh.intel.com> (raw)
In-Reply-To: <YdYFFzlPTvgFdSXL@google.com>
hi Sean,
Sorry for the late reply. I just saw this mail in my mailbox.
On Wed, Jan 05, 2022 at 08:52:39PM +0000, Sean Christopherson wrote:
> On Wed, Jan 05, 2022, Yan Zhao wrote:
> > Sorry, maybe I didn't express it clearly.
> >
> > As in the kvm_faultin_pfn_private(),
> > static bool kvm_faultin_pfn_private(struct kvm_vcpu *vcpu,
> > struct kvm_page_fault *fault,
> > bool *is_private_pfn, int *r)
> > {
> > int order;
> > int mem_convert_type;
> > struct kvm_memory_slot *slot = fault->slot;
> > long pfn = kvm_memfd_get_pfn(slot, fault->gfn, &order);
> > ...
> > }
> > Currently, kvm_memfd_get_pfn() is called unconditionally.
> > However, if the backend of a private memslot is not memfd, and is device
> > fd for example, a different xxx_get_pfn() is required here.
>
> Ya, I've complained about this in a different thread[*]. This should really be
> something like kvm_private_fd_get_pfn(), where the underlying ops struct can point
> at any compatible backing store.
>
> https://lore.kernel.org/all/YcuMUemyBXFYyxCC@google.com/
>
ok.
> > Further, though mapped to a private gfn, it might be ok for QEMU to
> > access the device fd in hva-based way (or call it MMU access way, e.g.
> > read/write/mmap), it's desired that it could use the traditional to get
> > pfn without convert the range to a shared one.
>
> No, this is expressly forbidden. The backing store for a private gfn must not
> be accessible by userspace. It's possible a backing store could support both, but
> not concurrently, and any conversion must be done without KVM being involved.
> In other words, resolving a private gfn must either succeed or fail (exit to
> userspace), KVM cannot initiate any conversions.
>
When it comes to a device passthrough via VFIO, there might be more work
related to the device fd as a backend.
First, unlike memfd which can allocate one private fd for a set of PFNs,
and one shared fd for another set of PFNs, for device fd, it needs to open
the same physical device twice, one for shared fd, and one for private fd.
Then, for private device fd, now its ramblock has to use qemu_ram_alloc_from_fd()
instead of current qemu_ram_alloc_from_ptr().
And as in VFIO, this private fd is shared by several ramblocks (each locating from
a different base offset), the base offsets also need to be kept somewhere
in order to call get_pfn successfully. (this info is kept in
vma through mmap() previously, so without mmap(), a new interface might
be required).
Also, for shared device fd, mmap() is required in order to allocate the
ramblock with qemu_ram_alloc_from_ptr(), and more importantly to make
the future gfn_to_hva, and hva_to_pfn possible.
But as the shared and private fds are based on the same physical device,
the vfio driver needs to record which vma ranges are allowed for the actual
mmap_fault, which vma area are not.
With the above changes, it only prevents the host user space from accessing
the device mapped to private GFNs.
For memory backends, host kernel space accessing is prevented via MKTME.
And for device, the device needs to the work to disallow host kernel
space access.
However, unlike memory side, the device side would not cause any MCE.
Thereby, host user space access to the device also would not cause MCEs, either.
So, I'm not sure if the above work is worthwhile to the device fd.
> > pfn = __gfn_to_pfn_memslot(slot, fault->gfn, ...)
> > |->addr = __gfn_to_hva_many (slot, gfn,...)
> > | pfn = hva_to_pfn (addr,...)
> >
> >
> > So, is it possible to recognize such kind of backends in KVM, and to get
> > the pfn in traditional way without converting them to shared?
> > e.g.
> > - specify KVM_MEM_PRIVATE_NONPROTECT to memory regions with such kind
> > of backends, or
> > - detect the fd type and check if get_pfn is provided. if no, go the
> > traditional way.
>
> No, because the whole point of this is to make guest private memory inaccessible
> to host userspace. Or did I misinterpret your questions?
I think the host unmap series is based on the assumption that host user
space access to the memory based to private guest GFNs would cause fatal
MCEs.
So, I hope for backends who will not bring this fatal error can keep
using traditional way to get pfn and be mapped to private GFNs at the
same time.
Thanks
Yan
next prev parent reply other threads:[~2022-01-14 6:11 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-12-23 12:29 [PATCH v3 kvm/queue 00/16] KVM: mm: fd-based approach for supporting KVM guest " Chao Peng
2021-12-23 12:29 ` [PATCH v3 kvm/queue 01/16] mm/shmem: Introduce F_SEAL_INACCESSIBLE Chao Peng
2022-01-04 14:22 ` David Hildenbrand
2022-01-06 13:06 ` Chao Peng
2022-01-13 15:56 ` David Hildenbrand
2021-12-23 12:29 ` [PATCH v3 kvm/queue 02/16] mm/memfd: Introduce MFD_INACCESSIBLE flag Chao Peng
2021-12-23 12:29 ` [PATCH v3 kvm/queue 03/16] mm/memfd: Introduce MEMFD_OPS Chao Peng
2021-12-24 3:53 ` Robert Hoo
2021-12-31 2:38 ` Chao Peng
2022-01-04 17:38 ` Sean Christopherson
2022-01-05 6:07 ` Chao Peng
2021-12-23 12:29 ` [PATCH v3 kvm/queue 04/16] KVM: Extend the memslot to support fd-based private memory Chao Peng
2021-12-23 17:35 ` Sean Christopherson
2021-12-31 2:53 ` Chao Peng
2022-01-04 17:34 ` Sean Christopherson
2021-12-23 12:30 ` [PATCH v3 kvm/queue 05/16] KVM: Maintain ofs_tree for fast memslot lookup by file offset Chao Peng
2021-12-23 18:02 ` Sean Christopherson
2021-12-24 3:54 ` Chao Peng
2021-12-27 23:50 ` Yao Yuan
2021-12-28 21:48 ` Sean Christopherson
2021-12-31 2:26 ` Chao Peng
2022-01-04 17:43 ` Sean Christopherson
2022-01-05 6:09 ` Chao Peng
2021-12-23 12:30 ` [PATCH v3 kvm/queue 06/16] KVM: Implement fd-based memory using MEMFD_OPS interfaces Chao Peng
2021-12-23 18:34 ` Sean Christopherson
2021-12-23 23:09 ` Paolo Bonzini
2021-12-24 4:25 ` Chao Peng
2021-12-28 22:14 ` Sean Christopherson
2021-12-24 4:12 ` Chao Peng
2021-12-24 4:22 ` Chao Peng
2021-12-23 12:30 ` [PATCH v3 kvm/queue 07/16] KVM: Refactor hva based memory invalidation code Chao Peng
2021-12-23 12:30 ` [PATCH v3 kvm/queue 08/16] KVM: Special handling for fd-based memory invalidation Chao Peng
2021-12-23 12:30 ` [PATCH v3 kvm/queue 09/16] KVM: Split out common memory invalidation code Chao Peng
2021-12-23 12:30 ` [PATCH v3 kvm/queue 10/16] KVM: Implement fd-based memory invalidation Chao Peng
2021-12-23 12:30 ` [PATCH v3 kvm/queue 11/16] KVM: Add kvm_map_gfn_range Chao Peng
2021-12-23 18:06 ` Sean Christopherson
2021-12-24 4:13 ` Chao Peng
2021-12-31 2:33 ` Chao Peng
2022-01-04 17:31 ` Sean Christopherson
2022-01-05 6:14 ` Chao Peng
2022-01-05 17:03 ` Sean Christopherson
2022-01-06 12:35 ` Chao Peng
2021-12-23 12:30 ` [PATCH v3 kvm/queue 12/16] KVM: Implement fd-based memory fallocation Chao Peng
2021-12-23 12:30 ` [PATCH v3 kvm/queue 13/16] KVM: Add KVM_EXIT_MEMORY_ERROR exit Chao Peng
2021-12-23 18:28 ` Sean Christopherson
2021-12-23 12:30 ` [PATCH v3 kvm/queue 14/16] KVM: Handle page fault for private memory Chao Peng
2022-01-04 1:46 ` Yan Zhao
2022-01-04 9:10 ` Chao Peng
2022-01-04 10:06 ` Yan Zhao
2022-01-05 6:28 ` Chao Peng
2022-01-05 7:53 ` Yan Zhao
2022-01-05 20:52 ` Sean Christopherson
2022-01-14 5:53 ` Yan Zhao [this message]
2021-12-23 12:30 ` [PATCH v3 kvm/queue 15/16] KVM: Use kvm_userspace_memory_region_ext Chao Peng
2021-12-23 12:30 ` [PATCH v3 kvm/queue 16/16] KVM: Register/unregister private memory slot to memfd Chao Peng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220114055315.GA29165@yzhao56-desk.sh.intel.com \
--to=yan.y.zhao@intel.com \
--cc=ak@linux.intel.com \
--cc=akpm@linux-foundation.org \
--cc=bfields@fieldses.org \
--cc=bp@alien8.de \
--cc=chao.p.peng@linux.intel.com \
--cc=corbet@lwn.net \
--cc=dave.hansen@intel.com \
--cc=david@redhat.com \
--cc=hpa@zytor.com \
--cc=hughd@google.com \
--cc=jlayton@kernel.org \
--cc=jmattson@google.com \
--cc=john.ji@intel.com \
--cc=joro@8bytes.org \
--cc=jun.nakajima@intel.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=kvm@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=luto@kernel.org \
--cc=mingo@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=seanjc@google.com \
--cc=susie.li@intel.com \
--cc=tglx@linutronix.de \
--cc=vkuznets@redhat.com \
--cc=wanpengli@tencent.com \
--cc=x86@kernel.org \
--cc=yu.c.zhang@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox