From: Nikita Kalyazin <kalyazin@amazon.com>
To: Peter Xu <peterx@redhat.com>
Cc: James Houghton <jthoughton@google.com>,
<akpm@linux-foundation.org>, <pbonzini@redhat.com>,
<shuah@kernel.org>, <kvm@vger.kernel.org>,
<linux-kselftest@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
<linux-mm@kvack.org>, <lorenzo.stoakes@oracle.com>,
<david@redhat.com>, <ryan.roberts@arm.com>,
<quic_eberman@quicinc.com>, <graf@amazon.de>,
<jgowans@amazon.com>, <roypat@amazon.co.uk>, <derekmn@amazon.com>,
<nsaenz@amazon.es>, <xmarcalx@amazon.com>
Subject: Re: [RFC PATCH 0/5] KVM: guest_memfd: support for uffd missing
Date: Thu, 13 Mar 2025 15:25:16 +0000 [thread overview]
Message-ID: <69dc324f-99fb-44ec-8501-086fe7af9d0d@amazon.com> (raw)
In-Reply-To: <Z9HhTjEWtM58Zfxf@x1.local>
On 12/03/2025 19:32, Peter Xu wrote:
> On Wed, Mar 12, 2025 at 05:07:25PM +0000, Nikita Kalyazin wrote:
>> However if MISSING is not registered, the kernel will auto-populate with a
>> clear page, ie there is no way to inject custom content from userspace. To
>> explain my use case a bit more, the population thread will be trying to copy
>> all guest memory proactively, but there will inevitably be cases where a
>> page is accessed through pgtables _before_ it gets populated. It is not
>> desirable for such access to result in a clear page provided by the kernel.
>
> IMHO populating with a zero page in the page cache is fine. It needs to
> make sure all accesses will go via the pgtable, as discussed below in my
> previous email [1], then nobody will be able to see the zero page, not
> until someone updates the content then follow up with a CONTINUE to install
> the pgtable entry.
>
> If there is any way that the page can be accessed without the pgtable
> installation, minor faults won't work indeed.
I think I see what you mean now. I agree, it isn't the end of the world
if the kernel clears the page and then userspace overwrites it.
The way I see it is:
@@ -400,20 +401,26 @@ static vm_fault_t kvm_gmem_fault(struct vm_fault *vmf)
if (WARN_ON_ONCE(folio_test_large(folio))) {
ret = VM_FAULT_SIGBUS;
goto out_folio;
}
if (!folio_test_uptodate(folio)) {
clear_highpage(folio_page(folio, 0));
kvm_gmem_mark_prepared(folio);
}
+ if (userfaultfd_minor(vmf->vma)) {
+ folio_unlock(folio);
+ filemap_invalidate_unlock_shared(inode->i_mapping);
+ return handle_userfault(vmf, VM_UFFD_MISSING);
+ }
+
vmf->page = folio_file_page(folio, vmf->pgoff);
out_folio:
if (ret != VM_FAULT_LOCKED) {
folio_unlock(folio);
folio_put(folio);
}
On the first fault (cache miss), the kernel will allocate/add/clear the
page (as there is no MISSING trap now), and once the page is in the
cache, a MINOR event will be sent for userspace to copy its content.
Please let me know if this is an acceptable semantics.
Since userspace is getting notified after KVM calls
kvm_gmem_mark_prepared(), which removes the page from the direct map
[1], userspace can't use write() to populate the content because write()
relies on direct map [2]. However userspace can do a plain memcpy that
would use user pagetables instead. This forces userspace to respond to
stage-2 and VMA faults in guest_memfd differently, via write() and
memcpy respectively. It doesn't seem like a significant problem though.
I believe, with this approach the original race condition is gone
because UFFD messages are only sent on cache hit and it is up to
userspace to serialise writes. Please correct me if I'm wrong here.
[1]
https://lore.kernel.org/kvm/20250221160728.1584559-1-roypat@amazon.co.uk/T/#mdf41fe2dc33332e9c500febd47e14ae91ad99724
[2]
https://lore.kernel.org/kvm/20241129123929.64790-1-kalyazin@amazon.com/T/#mf5d794aa31d753cbc73e193628f31e418051983d
>>
>>> as long as the content can only be accessed from the pgtable (either via
>>> mmap() or GUP on top of it), then afaiu it could work similarly like
>>> MISSING faults, because anything trying to access it will be trapped.
>
> [1]
>
> --
> Peter Xu
>
next prev parent reply other threads:[~2025-03-13 15:25 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-03 13:30 Nikita Kalyazin
2025-03-03 13:30 ` [RFC PATCH 1/5] KVM: guest_memfd: add kvm_gmem_vma_is_gmem Nikita Kalyazin
2025-03-03 13:30 ` [RFC PATCH 2/5] KVM: guest_memfd: add support for uffd missing Nikita Kalyazin
2025-03-03 13:30 ` [RFC PATCH 3/5] mm: userfaultfd: allow to register userfaultfd for guest_memfd Nikita Kalyazin
2025-03-03 13:30 ` [RFC PATCH 4/5] mm: userfaultfd: support continue " Nikita Kalyazin
2025-03-03 13:30 ` [RFC PATCH 5/5] KVM: selftests: add uffd missing test " Nikita Kalyazin
2025-03-03 21:29 ` [RFC PATCH 0/5] KVM: guest_memfd: support for uffd missing Peter Xu
2025-03-05 19:35 ` James Houghton
2025-03-05 20:29 ` Peter Xu
2025-03-10 18:12 ` Nikita Kalyazin
2025-03-10 19:57 ` Peter Xu
2025-03-11 16:56 ` Nikita Kalyazin
2025-03-12 15:45 ` Peter Xu
2025-03-12 17:07 ` Nikita Kalyazin
2025-03-12 19:32 ` Peter Xu
2025-03-13 15:25 ` Nikita Kalyazin [this message]
2025-03-13 19:12 ` Peter Xu
2025-03-13 22:13 ` Nikita Kalyazin
2025-03-13 22:38 ` Peter Xu
2025-03-14 17:12 ` Nikita Kalyazin
2025-03-14 18:32 ` Peter Xu
2025-03-14 20:04 ` Peter Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=69dc324f-99fb-44ec-8501-086fe7af9d0d@amazon.com \
--to=kalyazin@amazon.com \
--cc=akpm@linux-foundation.org \
--cc=david@redhat.com \
--cc=derekmn@amazon.com \
--cc=graf@amazon.de \
--cc=jgowans@amazon.com \
--cc=jthoughton@google.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=nsaenz@amazon.es \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=quic_eberman@quicinc.com \
--cc=roypat@amazon.co.uk \
--cc=ryan.roberts@arm.com \
--cc=shuah@kernel.org \
--cc=xmarcalx@amazon.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox