linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Nikita Kalyazin <kalyazin@amazon.com>
Cc: "David Hildenbrand (Red Hat)" <david@kernel.org>,
	Mike Rapoport <rppt@kernel.org>,
	linux-mm@kvack.org, Andrea Arcangeli <aarcange@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Axel Rasmussen <axelrasmussen@google.com>,
	Baolin Wang <baolin.wang@linux.alibaba.com>,
	Hugh Dickins <hughd@google.com>,
	James Houghton <jthoughton@google.com>,
	"Liam R. Howlett" <Liam.Howlett@oracle.com>,
	Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
	Michal Hocko <mhocko@suse.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Sean Christopherson <seanjc@google.com>,
	Shuah Khan <shuah@kernel.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
	linux-kselftest@vger.kernel.org
Subject: Re: [PATCH v3 4/5] guest_memfd: add support for userfaultfd minor mode
Date: Mon, 1 Dec 2025 15:57:56 -0500	[thread overview]
Message-ID: <aS4BVC42JiUT51rS@x1.local> (raw)
In-Reply-To: <76e3d5bf-df73-4293-84f6-0d6ddabd0fd7@amazon.com>

On Mon, Dec 01, 2025 at 08:12:38PM +0000, Nikita Kalyazin wrote:
> 
> 
> On 01/12/2025 18:35, Peter Xu wrote:
> > On Mon, Dec 01, 2025 at 04:48:22PM +0000, Nikita Kalyazin wrote:
> > > I believe I found the precise point where we convinced ourselves that minor
> > > support was sufficient: [1].  If at this moment we don't find that reasoning
> > > valid anymore, then indeed implementing missing is the only option.
> > > 
> > > [1] https://lore.kernel.org/kvm/Z9GsIDVYWoV8d8-C@x1.local
> > 
> > Now after I re-read the discussion, I may have made a wrong statement
> > there, sorry.  I could have got slightly confused on when the write()
> > syscall can be involved.
> > 
> > I agree if you want to get an event when cache missed with the current uffd
> > definitions and when pre-population is forbidden, then MISSING trap is
> > required.  That is, with/without the need of UFFDIO_COPY being available.
> > 
> > Do I understand it right that UFFDIO_COPY is not allowed in your case, but
> > only write()?
> 
> No, UFFDIO_COPY would work perfectly fine.  We will still use write()
> whenever we resolve stage-2 faults as they aren't visible to UFFD.  When a
> userfault occurs at an offset that already has a page in the cache, we will
> have to keep using UFFDIO_CONTINUE so it looks like both will be required:
> 
>  - user mapping major fault -> UFFDIO_COPY (fills the cache and sets up
> userspace PT)
>  - user mapping minor fault -> UFFDIO_CONTINUE (only sets up userspace PT)
>  - stage-2 fault -> write() (only fills the cache)

Is stage-2 fault about KVM_MEMORY_EXIT_FLAG_USERFAULT, per James's series?

It looks fine indeed, but it looks slightly weird then, as you'll have two
ways to populate the page cache.  Logically here atomicity is indeed not
needed when you trap both MISSING + MINOR.

> 
> > 
> > One way that might work this around, is introducing a new UFFD_FEATURE bit
> > allowing the MINOR registration to trap all pgtable faults, which will
> > change the MINOR fault semantics.
> 
> This would equally work for us.  I suppose this MINOR+MAJOR semantics would
> be more intrusive from the API point of view though.

Yes it is, it's just that I don't know whether it'll be harder when you
want to completely support UFFDIO_COPY here, per previous discussions.

After a 2nd thought, such UFFD_FEATURE is probably not a good design,
because it essentially means that feature bit will functionally overlap
with what MISSING trap was trying to do, however duplicating that concept
in a VMA that was registered as MINOR only.

Maybe it's possible instead if we allow a module to support MISSING trap,
but without supporting UFFDIO_COPY ioctl.

That is, the MISSING events will be properly generated if MISSING traps are
supported, however the module needs to provide its own way to resolve it if
UFFDIO_COPY ioctl isn't available.  Gmem is fine in this case as long as
it'll always be registered with both MISSING+MINOR traps, then resolving
using write()s would work.

Such would be possible when with something like my v3 previously:

https://lore.kernel.org/all/20250926211650.525109-1-peterx@redhat.com/#t

Then gmem needs to declare VM_UFFD_MISSING + VM_UFFD_MINOR in
uffd_features, but _UFFDIO_CONTINUE only (without _UFFDIO_COPY) in
uffd_ioctls.

Since Mike already took this series over, I'll leave that to you all to
decide.

-- 
Peter Xu



  reply	other threads:[~2025-12-01 20:58 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-30 11:18 [PATCH v3 0/5] mm, kvm: add guest_memfd support for uffd minor faults Mike Rapoport
2025-11-30 11:18 ` [PATCH v3 1/5] userfaultfd: move vma_can_userfault out of line Mike Rapoport
2025-11-30 11:18 ` [PATCH v3 2/5] userfaultfd, shmem: use a VMA callback to handle UFFDIO_CONTINUE Mike Rapoport
2025-11-30 11:18 ` [PATCH v3 3/5] mm: introduce VM_FAULT_UFFD_MINOR fault reason Mike Rapoport
2025-12-01  8:59   ` David Hildenbrand (Red Hat)
2025-11-30 11:18 ` [PATCH v3 4/5] guest_memfd: add support for userfaultfd minor mode Mike Rapoport
2025-12-01  9:12   ` David Hildenbrand (Red Hat)
2025-12-01 13:39   ` Nikita Kalyazin
2025-12-01 15:54     ` David Hildenbrand (Red Hat)
2025-12-01 16:48       ` Nikita Kalyazin
2025-12-01 18:35         ` Peter Xu
2025-12-01 20:12           ` Nikita Kalyazin
2025-12-01 20:57             ` Peter Xu [this message]
2025-12-02 11:50               ` Nikita Kalyazin
2025-12-02 15:36                 ` Peter Xu
2025-12-02 15:59                   ` Nikita Kalyazin
2025-12-03  9:23                 ` David Hildenbrand (Red Hat)
2025-12-03 10:03                   ` Nikita Kalyazin
2025-12-04 17:27                     ` Nikita Kalyazin
2025-11-30 11:18 ` [PATCH v3 5/5] KVM: selftests: test userfaultfd minor for guest_memfd Mike Rapoport

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aS4BVC42JiUT51rS@x1.local \
    --to=peterx@redhat.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=axelrasmussen@google.com \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=david@kernel.org \
    --cc=hughd@google.com \
    --cc=jthoughton@google.com \
    --cc=kalyazin@amazon.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mhocko@suse.com \
    --cc=pbonzini@redhat.com \
    --cc=rppt@kernel.org \
    --cc=seanjc@google.com \
    --cc=shuah@kernel.org \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox