From: Peter Xu <peterx@redhat.com>
To: "Liam R. Howlett" <Liam.Howlett@oracle.com>,
David Hildenbrand <david@redhat.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
Mike Rapoport <rppt@kernel.org>,
Muchun Song <muchun.song@linux.dev>,
Nikita Kalyazin <kalyazin@amazon.com>,
Vlastimil Babka <vbabka@suse.cz>,
Axel Rasmussen <axelrasmussen@google.com>,
Andrew Morton <akpm@linux-foundation.org>,
James Houghton <jthoughton@google.com>,
Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
Hugh Dickins <hughd@google.com>, Michal Hocko <mhocko@suse.com>,
Ujwal Kundur <ujwal.kundur@gmail.com>,
Oscar Salvador <osalvador@suse.de>,
Suren Baghdasaryan <surenb@google.com>,
Andrea Arcangeli <aarcange@redhat.com>
Subject: Re: [PATCH v4 0/4] mm/userfaultfd: modulize memory types
Date: Tue, 21 Oct 2025 12:28:17 -0400 [thread overview]
Message-ID: <aPe0oWR9-Oj58Asz@x1.local> (raw)
In-Reply-To: <dtepn7obw5syd47uhyxavytodp7ws2pzr2yuchda32wcwn4bj4@wazn24gijumu>
On Tue, Oct 21, 2025 at 11:51:33AM -0400, Liam R. Howlett wrote:
> * Peter Xu <peterx@redhat.com> [251020 10:12]:
> > On Mon, Oct 20, 2025 at 03:34:47PM +0200, David Hildenbrand wrote:
> > > On 15.10.25 01:14, Peter Xu wrote:
> > > > [based on latest akpm/mm-new of Oct 14th, commit 36c6c5ce1b275]
> > > >
> > > > v4:
> > > > - Some cleanups within vma_can_userfault() [David]
> > > > - Rename uffd_get_folio() to minor_get_folio() [David]
> > > > - Remove uffd_features in vm_uffd_ops, deduce it from supported ioctls [David]
> > > >
> > > > v1: https://lore.kernel.org/r/20250620190342.1780170-1-peterx@redhat.com
> > > > v2: https://lore.kernel.org/r/20250627154655.2085903-1-peterx@redhat.com
> > > > v3: https://lore.kernel.org/r/20250926211650.525109-1-peterx@redhat.com
> > > >
> > > > This series is an alternative proposal of what Nikita proposed here on the
> > > > initial three patches:
> > > >
> > > > https://lore.kernel.org/r/20250404154352.23078-1-kalyazin@amazon.com
> > > >
> > > > This is not yet relevant to any guest-memfd support, but paving way for it.
> > > > Here, the major goal is to make kernel modules be able to opt-in with any
> > > > form of userfaultfd supports, like guest-memfd. This alternative option
> > > > should hopefully be cleaner, and avoid leaking userfault details into
> > > > vm_ops.fault().
> > > >
> > > > It also means this series does not depend on anything. It's a pure
> > > > refactoring of userfaultfd internals to provide a generic API, so that
> > > > other types of files, especially RAM based, can support userfaultfd without
> > > > touching mm/ at all.
> > > >
> > > > To achieve it, this series introduced a file operation called vm_uffd_ops.
> > > > The ops needs to be provided when a file type supports any of userfaultfd.
> > > >
> > > > With that, I moved both hugetlbfs and shmem over, whenever possible. So
> > > > far due to concerns on exposing an uffd_copy() API, the MISSING faults are
> > > > still separately processed and can only be done within mm/. Hugetlbfs kept
> > > > its special paths untouched.
> > > >
> > > > An example of shmem uffd_ops:
> > > >
> > > > static const struct vm_uffd_ops shmem_uffd_ops = {
> > > > .supported_ioctls = BIT(_UFFDIO_COPY) |
> > > > BIT(_UFFDIO_ZEROPAGE) |
> > > > BIT(_UFFDIO_WRITEPROTECT) |
> > > > BIT(_UFFDIO_CONTINUE) |
> > > > BIT(_UFFDIO_POISON),
> > > > .minor_get_folio = shmem_uffd_get_folio,
> > > > };
>
> I think you forgot to add the link to the guest_memfd implementation [1]
> to your cover letter.
I didn't.
https://lore.kernel.org/all/20251014231501.2301398-1-peterx@redhat.com/
To show another sample, this is the patch that Nikita posted to implement
minor fault for guest-memfd (on top of older versions of this series):
https://lore.kernel.org/all/114133f5-0282-463d-9d65-3143aa658806@amazon.com/
>
> > >
> > > This looks better than the previous version to me.
> > >
> > > Long term the goal should be to move all hugetlb/shmem specific stuff out of
> > > mm/hugetlb.c and of course, we won't be adding any new ones to
> > > mm/userfaultfd.c
> > >
> > > I agree with Liam that a better interface could be providing default
> > > handlers for the separate ioctls [1], but there is always the option to
> > > evolve this interface into something like that later.
> >
> > Thanks for accepting this current form.
> >
> > >
> > >
> > > [1] https://lkml.kernel.org/r/frnos5jtmlqvzpcrredcoummuzvllweku5dgp5ii5in6epwnw5@anu4dqsz6shy
> >
> > I have replied to that, here:
> >
> > https://lore.kernel.org/all/aOVEDii4HPB6outm@x1.local/
> >
> > If we ignore hugetlbfs, most of the hooks may not be needed, as explained.
>
> Those were examples.
>
> Hooks allow for all the memory type checking to go away in the code,
> which allows for more readable code and less operations per call.
>
> >
> > If we introduce hooks only for hugetlbfs, IMHO it's going backwards. When
> > we want to get rid of hugetlbfs paths, we will have something more to get
> > rid of..
>
> This is just wrong.
>
> It is far easier to remove one function pointer than go through all the
> code and remove the checks for hugetlbfs.
>
> Are you thinking the hooks will just point to the generic function?
> This is the only way I can see your statement making sense. That's not
> the idea I'm trying to communicate.
>
> The idea is that you split the functions into parts that everyone does
> and special parts, then call them in the correct sequence for each type.
> New types need new special parts while using the generic code for the
> majority of the work.
>
> In this way, the memory types are modularized into function pointers
> that all use common code without adding complexity. In fact, knowing
> implicitly which context from call path means we don't need to check the
> types and should be able to reduce the complexity.
>
> Then adding a new memory type will call almost all the same functions
> except for special areas.
>
> Removing old memory types would me removing the special areas only - and
> maybe a function pointer if they are the only user.
>
> The current patch set does not modularizing memory, it is creating a
> middleware level where we have to parse a value to figure out what to
> do.
>
> These patches DO expose a method for memory types to be coded in a
> kernel module, which is fundamentally different than modularizing the
> memory types. Different enough to be glossed over on a ML by looking at
> the subject alone.
>
> Yes, one value is better than two values, but no magic values is ideal.
>
> Is it a significant amount of work to remove the magic value by
> fragmenting the code into memory type specific function pointers?
>
> IOW, instead of decoding the value to figure out where to route calls,
> just expose the calls directly in the function pointer layer that you
> are creating? What is the minimum amount of function pointers to get
> the guest_memfd to work without this value being parsed?
>
> [1]. https://lore.kernel.org/all/114133f5-0282-463d-9d65-3143aa658806@amazon.com/
I don't know what you're looking for.
I think I got most acks from userfaultfd developers whoever were active in
the past few years, ever since v1...
Then, we got some concern on uffd_copy() API being complicated, it's fine,
I dropped it.
We got some other concern on having a function returning folio pointer. We
talked it all through, luckily, even if I do not know what really happened.
Now, I really don't know what you're suggesting here.
Can you send some patches and show us the code, help everyone to support
guest-memfd minor fault, please?
--
Peter Xu
next prev parent reply other threads:[~2025-10-21 16:28 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-14 23:14 Peter Xu
2025-10-14 23:14 ` [PATCH v4 1/4] mm: Introduce vm_uffd_ops API Peter Xu
2025-10-20 14:18 ` David Hildenbrand
2025-10-14 23:14 ` [PATCH v4 2/4] mm/shmem: Support " Peter Xu
2025-10-20 14:18 ` David Hildenbrand
2025-10-14 23:15 ` [PATCH v4 3/4] mm/hugetlb: " Peter Xu
2025-10-20 14:19 ` David Hildenbrand
2025-10-14 23:15 ` [PATCH v4 4/4] mm: Apply vm_uffd_ops API to core mm Peter Xu
2025-10-20 13:34 ` [PATCH v4 0/4] mm/userfaultfd: modulize memory types David Hildenbrand
2025-10-20 14:12 ` Peter Xu
2025-10-21 15:51 ` Liam R. Howlett
2025-10-21 16:28 ` Peter Xu [this message]
2025-10-30 17:13 ` Liam R. Howlett
2025-10-30 18:00 ` Nikita Kalyazin
2025-10-30 19:07 ` Peter Xu
2025-10-30 19:55 ` Peter Xu
2025-10-30 20:23 ` Lorenzo Stoakes
2025-10-30 21:13 ` Peter Xu
2025-10-30 21:27 ` Peter
2025-11-03 20:01 ` David Hildenbrand (Red Hat)
2025-11-03 20:46 ` Peter Xu
2025-11-03 21:27 ` David Hildenbrand (Red Hat)
2025-11-03 22:49 ` Peter Xu
2025-11-04 7:10 ` Lorenzo Stoakes
2025-11-04 14:18 ` David Hildenbrand (Red Hat)
2025-11-04 7:21 ` Mike Rapoport
2025-11-04 12:23 ` David Hildenbrand (Red Hat)
2025-11-06 16:32 ` Liam R. Howlett
2025-11-09 7:11 ` Mike Rapoport
2025-11-10 16:34 ` Liam R. Howlett
2025-11-11 10:05 ` Mike Rapoport
2025-10-30 20:52 ` Liam R. Howlett
2025-10-30 21:33 ` Peter Xu
2025-10-30 20:24 ` Liam R. Howlett
2025-10-30 21:26 ` Peter Xu
2025-11-03 16:11 ` Mike Rapoport
2025-11-03 18:43 ` Liam R. Howlett
2025-11-05 21:23 ` David Hildenbrand
2025-11-06 16:16 ` Liam R. Howlett
2025-11-07 10:16 ` David Hildenbrand (Red Hat)
2025-11-07 16:55 ` Liam R. Howlett
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aPe0oWR9-Oj58Asz@x1.local \
--to=peterx@redhat.com \
--cc=Liam.Howlett@oracle.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=axelrasmussen@google.com \
--cc=david@redhat.com \
--cc=hughd@google.com \
--cc=jthoughton@google.com \
--cc=kalyazin@amazon.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mhocko@suse.com \
--cc=muchun.song@linux.dev \
--cc=osalvador@suse.de \
--cc=rppt@kernel.org \
--cc=surenb@google.com \
--cc=ujwal.kundur@gmail.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox