From: Nikita Kalyazin <kalyazin@amazon.com>
To: "Liam R. Howlett" <Liam.Howlett@oracle.com>,
Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Peter Xu <peterx@redhat.com>,
David Hildenbrand <david@redhat.com>,
"Mike Rapoport" <rppt@kernel.org>,
Suren Baghdasaryan <surenb@google.com>, <linux-mm@kvack.org>,
<linux-kernel@vger.kernel.org>, Vlastimil Babka <vbabka@suse.cz>,
Muchun Song <muchun.song@linux.dev>,
Hugh Dickins <hughd@google.com>,
Andrew Morton <akpm@linux-foundation.org>,
James Houghton <jthoughton@google.com>,
Michal Hocko <mhocko@suse.com>,
Andrea Arcangeli <aarcange@redhat.com>,
Oscar Salvador <osalvador@suse.de>,
Axel Rasmussen <axelrasmussen@google.com>,
Ujwal Kundur <ujwal.kundur@gmail.com>
Subject: Re: [PATCH v2 1/4] mm: Introduce vm_uffd_ops API
Date: Mon, 1 Sep 2025 17:01:11 +0100 [thread overview]
Message-ID: <930d8830-3d5d-496d-80d8-b716ea6446bb@amazon.com> (raw)
In-Reply-To: <e7vr62s73dftijeveyg6lfgivctijz4qcar3teswjbuv6gog3k@4sbpuj35nbbh>
On 04/07/2025 20:39, Liam R. Howlett wrote:
> * Peter Xu <peterx@redhat.com> [250704 11:00]:
>> On Fri, Jul 04, 2025 at 11:34:15AM +0200, David Hildenbrand wrote:
>>> On 03.07.25 19:48, Mike Rapoport wrote:
>>>> On Wed, Jul 02, 2025 at 03:46:57PM -0400, Peter Xu wrote:
>>>>> On Wed, Jul 02, 2025 at 08:39:32PM +0300, Mike Rapoport wrote:
>>>>>
>>>>> [...]
>>>>>
>>>>>>> The main target of this change is the implementation of UFFD for
>>>>>>> KVM/guest_memfd (examples: [1], [2]) to avoid bringing KVM-specific code
>>>>>>> into the mm codebase. We usually mean KVM by the "drivers" in this context,
>>>>>>> and it is already somewhat "knowledgeable" of the mm. I don't think there
>>>>>>> are existing use cases for other drivers to implement this at the moment.
>>>>>>>
>>>>>>> Although I can't see new exports in this series, there is now a way to limit
>>>>>>> exports to particular modules [3]. Would it help if we only do it for KVM
>>>>>>> initially (if/when actually needed)?
>>>>>>
>>>>>> There were talks about pulling out guest_memfd core into mm, but I don't
>>>>>> remember patches about it. If parts of guest_memfd were already in mm/ that
>>>>>> would make easier to export uffd ops to it.
>>>>>
>>>>> Do we have a link to such discussion? I'm also curious whether that idea
>>>>> was acknowledged by KVM maintainers.
>>>>
>>>> AFAIR it was discussed at one of David's guest_memfd calls
>>>
>>> While it was discussed in the call a couple of times in different context
>>> (guest_memfd as a library / guest_memfd shim), I think we already discussed
>>> it back at LPC last year.
>>>
>>> One of the main reasons for doing that is supporting guest_memfd in other
>>> hypervisors -- the gunyah hypervisor in the kernel wants to make use of it
>>> as well.
>>
>> I see, thanks for the info. I found the series, it's here:
>>
>> https://lore.kernel.org/all/20241113-guestmem-library-v3-0-71fdee85676b@quicinc.com/
>>
>> Here, the question is whether do we still want to keep explicit calls to
>> shmem, hugetlbfs and in the future, guest-memfd. The library-ize of
>> guest-memfd doesn't change a huge lot on answering this question, IIUC.
>
> Can you explore moving hugetlb_mfill_atomic_pte and
> shmem_mfill_atomic_pte into mm/userfaultfd.c and generalizing them to
> use the same code?
>
> That is, split the similar blocks into functions and reduce duplication.
>
> These are under the UFFD config option and are pretty similar. This
> will also limit the bleeding of mfill_atomic_mode out of uffd.
>
>
>
> If you look at what the code does in userfaultfd.c, you can see that
> some refactoring is necessary for other reasons:
>
> mfill_atomic() calls mfill_atomic_hugetlb(), or it enters a while
> (src_addr < src_start + len) to call mfill_atomic_pte().. which might
> call shmem_mfill_atomic_pte() in mm/shmem.c
>
> mfill_atomic_hugetlb() calls, in a while (src_addr < src_start + len)
> loop and calls hugetlb_mfill_atomic_pte() in mm/hugetlb.c
>
> The shmem call already depends on the vma flags.. which it still does in
> your patch 4 here. So you've replaced this:
>
> if (!(dst_vma->vm_flags & VM_SHARED)) {
> ...
> } else {
> shmem_mfill_atomic_pte()
> }
>
> With...
>
> if (!(dst_vma->vm_flags & VM_SHARED)) {
> ...
> } else {
> ...
> uffd_ops->uffd_copy()
> }
>
> So, really, what needs to happen first is userfaultfd needs to be
> sorted.
>
> There's no point of creating a vm_ops_uffd if it will just serve as
> replacing the call locations of the functions like this, as it has done
> nothing to simplify the logic.
>
>> However if we want to generalize userfaultfd capability for a type of
>> memory, we will still need something like the vm_uffd_ops hook to report
>> such information. It means drivers can still overwrite these, with/without
>> an exported mfill_atomic_install_pte() functions. I'm not sure whether
>> that eases the concern.
>
> If we work through the duplication and reduction where possible, the
> path forward may be easier to see.
>
>>
>> So to me, generalizing the mem type looks helpful with/without moving
>> guest-memfd under mm/.
>
> Yes, it should decrease the duplication across hugetlb.c and shmem.c,
> but I think that userfaultfd is the place to start.
>
>>
>> We do have the option to keep hard-code guest-memfd like shmem or
>> hugetlbfs. This is still "doable", but this likely means guest-memfd
>> support for userfaultfd needs to be done after that work. I did quickly
>> check the status of gunyah hypervisor [1,2,3], I found that all of the
>> efforts are not yet continued in 2025. The hypervisor last update was Jan
>> 2024 with a batch push [1].
>>
>> I still prefer generalizing uffd capabilities using the ops. That makes
>> guest-memfd support on MINOR not be blocked and it should be able to be
>> done concurrently v.s. guest-memfd library. If guest-memfd library idea
>> didn't move on, it's non-issue either.
>>
>> I've considered dropping uffd_copy() and MISSING support for vm_uffd_ops if
>> I'm going to repost - that looks like the only thing that people are
>> against with, even though that is not my preference, as that'll make the
>> API half-broken on its own.
>
> The generalisation you did does not generalize much, as I pointed out
> above, and so it seems less impactful than it could be.
>
> These patches also do not explore what this means for guest_memfd. So
> it is not clear that the expected behaviour will serve the need.
>
> You sent a link to an example user. Can you please keep this work
> together in the patch set so that we know it'll work for your use case
> and allows us an easier way to pull down this work so we can examine it.
Hi Liam, Lorenzo,
With mmap support in guest_memfd being recently accepted and merged into
kvm/next [1], UFFDIO_CONTINUE support in guest_memfd becomes a real use
case.
From what I understand, it is the API for UFFDIO_COPY (ie the
.uffd_copy callback) proposed by Peter that raises the safery issues,
while the generalisation of the checks (.uffd_features, .uffd_ioctls)
and .uffd_get_folio needed for UFFDIO_CONTINUE do not introduce such
concerns. In order to unblock the userfaultfd support in guest_memfd,
would it be acceptable to start with implementing
.uffd_get_folio/UFFDIO_CONTINUE only, leaving the callback for
UFFDIO_COPY for later when we have an idea about a safer API and a clear
use case for that?
If that's the case, what would be the best way to demonstrate the client
code? Would using kvm/next as the base for the aggregated series (new
mm API + client code in kvm/guest_memfd) work?
Thanks,
Nikita
[1]: https://lore.kernel.org/kvm/20250729225455.670324-1-seanjc@google.com/
>
> Alternatively, you can reduce and combine the memory types without
> exposing the changes externally, if they stand on their own. But I
> don't think anyone is going to accept using a vm_ops change where a
> direct function call could be used.
>
>> Said that, I still prefer this against
>> hard-code and/or use CONFIG_GUESTMEM in userfaultfd code.
>
> It also does nothing with regards to the CONFIG_USERFAULTFD in other
> areas. My hope is that you could pull out the common code and make the
> CONFIG_ sections smaller.
>
> And, by the way, it isn't the fact that we're going to copy something
> (or mfill_atomic it, I guess?) but the fact that we're handing out the
> pointer for something else to do that. Please handle this manipulation
> in the core mm code without handing out pointers to folios, or page
> tables.
>
> You could do this with the address being passed in and figure out the
> type, or even a function pointer that you specifically pass in an enum
> of the type (I think this is what Lorenzo was suggesting somewhere),
> maybe with multiple flags for actions and fallback (MFILL|ZERO for
> example).
>
> Thanks,
> Liam
>
next prev parent reply other threads:[~2025-09-01 16:01 UTC|newest]
Thread overview: 71+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-27 15:46 [PATCH v2 0/4] mm/userfaultfd: modulize memory types Peter Xu
2025-06-27 15:46 ` [PATCH v2 1/4] mm: Introduce vm_uffd_ops API Peter Xu
2025-06-29 8:50 ` Mike Rapoport
2025-07-02 19:30 ` Peter Xu
2025-06-30 10:15 ` Lorenzo Stoakes
2025-07-01 17:04 ` Suren Baghdasaryan
2025-07-02 15:40 ` Liam R. Howlett
2025-07-02 15:56 ` Lorenzo Stoakes
2025-07-02 17:08 ` Nikita Kalyazin
2025-07-02 17:39 ` Mike Rapoport
2025-07-02 19:46 ` Peter Xu
2025-07-03 17:48 ` Mike Rapoport
2025-07-04 9:34 ` David Hildenbrand
2025-07-04 14:59 ` Peter Xu
2025-07-04 19:39 ` Liam R. Howlett
2025-09-01 16:01 ` Nikita Kalyazin [this message]
2025-09-08 16:53 ` Liam R. Howlett
2025-09-16 20:05 ` Peter Xu
2025-09-17 15:29 ` Liam R. Howlett
2025-09-17 9:25 ` Mike Rapoport
2025-09-17 16:53 ` Liam R. Howlett
2025-09-18 8:37 ` Mike Rapoport
2025-09-18 16:47 ` Liam R. Howlett
2025-09-18 17:15 ` Nikita Kalyazin
2025-09-18 17:45 ` Lorenzo Stoakes
2025-09-18 17:53 ` David Hildenbrand
2025-09-18 18:20 ` Peter Xu
2025-09-18 19:43 ` Liam R. Howlett
2025-09-18 21:07 ` Peter Xu
2025-09-19 1:50 ` Liam R. Howlett
2025-09-19 14:16 ` Peter Xu
2025-09-19 14:34 ` Lorenzo Stoakes
2025-09-19 15:12 ` Peter Xu
2025-09-19 19:38 ` Liam R. Howlett
2025-09-22 16:33 ` Peter Xu
2025-09-22 17:20 ` David Hildenbrand
2025-09-22 18:03 ` Peter Xu
2025-09-18 17:54 ` Liam R. Howlett
2025-09-18 18:05 ` Mike Rapoport
2025-09-18 18:32 ` Liam R. Howlett
2025-09-18 19:32 ` Peter Xu
2025-09-19 9:05 ` Mike Rapoport
2025-09-16 19:55 ` Peter Xu
2025-09-19 17:22 ` Liam R. Howlett
2025-09-22 16:38 ` Peter Xu
2025-07-02 21:24 ` Liam R. Howlett
2025-07-02 21:36 ` Peter Xu
2025-07-03 2:00 ` Liam R. Howlett
2025-07-03 15:24 ` Peter Xu
2025-07-03 16:15 ` Lorenzo Stoakes
2025-07-03 17:39 ` Liam R. Howlett
2025-07-02 20:24 ` Peter Xu
2025-07-03 16:32 ` Lorenzo Stoakes
2025-07-02 18:16 ` Mike Rapoport
2025-07-02 20:22 ` Peter Xu
2025-07-03 15:01 ` Suren Baghdasaryan
2025-07-03 15:45 ` Peter Xu
2025-07-03 16:01 ` Lorenzo Stoakes
2025-06-27 15:46 ` [PATCH v2 2/4] mm/shmem: Support " Peter Xu
2025-06-29 8:51 ` Mike Rapoport
2025-06-27 15:46 ` [PATCH v2 3/4] mm/hugetlb: " Peter Xu
2025-06-29 8:52 ` Mike Rapoport
2025-06-27 15:46 ` [PATCH v2 4/4] mm: Apply vm_uffd_ops API to core mm Peter Xu
2025-06-29 8:55 ` Mike Rapoport
2025-07-02 20:38 ` Peter Xu
2025-06-30 10:29 ` [PATCH v2 0/4] mm/userfaultfd: modulize memory types Lorenzo Stoakes
2025-07-01 0:15 ` Andrew Morton
2025-07-02 20:36 ` Peter Xu
2025-07-03 15:55 ` Lorenzo Stoakes
2025-07-03 16:26 ` Peter Xu
2025-07-03 16:44 ` Lorenzo Stoakes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=930d8830-3d5d-496d-80d8-b716ea6446bb@amazon.com \
--to=kalyazin@amazon.com \
--cc=Liam.Howlett@oracle.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=axelrasmussen@google.com \
--cc=david@redhat.com \
--cc=hughd@google.com \
--cc=jthoughton@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mhocko@suse.com \
--cc=muchun.song@linux.dev \
--cc=osalvador@suse.de \
--cc=peterx@redhat.com \
--cc=rppt@kernel.org \
--cc=surenb@google.com \
--cc=ujwal.kundur@gmail.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox