From: "David P. Reed" <dpreed@deepplum.com>
To: "James Houghton" <jthoughton@google.com>
Cc: "Andrew Morton" <akpm@linux-foundation.org>,
linux-mm@kvack.org, "Peter Xu" <peterx@redhat.com>,
"Axel Rasmussen" <axelrasmussen@google.com>
Subject: Re: PROBLEM: userfaultfd REGISTER minor mode on MAP_PRIVATE range fails
Date: Mon, 15 Sep 2025 18:58:48 -0400 (EDT) [thread overview]
Message-ID: <1757977128.137610687@apps.rackspace.com> (raw)
In-Reply-To: <CADrL8HWGcj1oANGY=qAzpYi_-E-Xbi=L28Bmyyf8H7auVix=QQ@mail.gmail.com>
On Monday, September 15, 2025 16:24, "James Houghton" <jthoughton@google.com> said:
> On Mon, Sep 15, 2025 at 1:13 PM David P. Reed <dpreed@deepplum.com> wrote:
>>
>>
>> [1.] One line summary of the problem: userfaultfd REGISTER minor mode on
>> MAP_PRIVATE fails
>> [2.] Full description of the problem/report:
>> The userfaultfd man page and the kernel docs seem to indicate that an area
>> mapped
>> MAP_PRIVATE|MAP_ANONYMOUS can be registered to handle MINOR page faults on
>> regular pages.
>> However, testing showed that not to work. MAP_SHARED does allow registration for
>> MINOR
>> page fault events, though.
>> Either the documentation or the code should be fixed, IMO. Now reading the code
>> that rejects
>> this case in the kernel source, the test in vma_can_userfault() that rejects this
>> is this
>> line:
>> if ((vm_flags & VM_UFFD_MINOR) &&
>> (!is_vm_hugetlb_page(vma) && !vma_is_shmem(vma)))
>> return false;
>> which probably should include !vma_is_anonymous(vma).
>>
>> Or maybe the COW that might happen if the program were forked is something that
>> can't be handled, which seems odd.
>
> UFFDIO_CONTINUE, the resolution ioctl for userfaultfd minor faults,
> doesn't have defined semantics for MAP_PRIVATE mappings. The
> documentation is unclear that MAP_PRIVATE + userfaultfd minor faults
> is invalid, but this is intentional behavior.
>
> What would you like UFFDIO_CONTINUE on MAP_PRIVATE to do? Should it
> populate a read-only PTE? Should it do CoW and populate a writable
> PTE? I'm curious to hear more about your use case (and why UFFDIO_COPY
> doesn't do what you want).
>
Well, I was just expecting to UFFDIO_CONTINUE to do whatever "normally" gets done. So, the normal case for MAP_PRIVATE|MAP_ANONYMOUS, if the page is in the swap cache and thus takes a minor fault, would depend on whether the access was a write or a read.
For a read, the page just gets installed in the page map from the swap cache.
For a write, if the page hasn't yet been copied, a copy is made of the swap cache contents of that page at that point, and the new copy is installed into the page table of the writing process.
However, the problem I'm reporting is that I can't even register such a page for minor page faults.
Now there is a question of the meaning of UUFIO_COPY should be (not continue). If page is MAP_PRIVATE, MAP_COPY is like writing to the page at the time of the minor fault. So the version of the data in the swap cache for the page should be ignored, replacing the local version makes sense. Any other process that still has the original version from the time of the fork() that shared the page should not be affected, I would think.
There is a confusing possibility, however, with the file descriptor for uffd. In the case of a fork(), the file descriptor would be shared, and so either fork could end up listening via poll/select.
It's hard to decide what is right semantically, because the normal use of userfault is to monitor from another process, though you can use read() in the same process as the faulting one - this seems to be because either fork or a unix-socket can be the path for sending the file descriptor to another process. But this is just definitional, the actual user design would have to handle faults in one place or another.
Now in this case, whichever process does the first read() on the file descriptor would get the information about the minor fault. (I assume both would NOT, but I'm early in my use of userfaultfd). So it could continue or copy, as desired.
Generally, anyone using userfaultfd would understand the nuances of fork() and file handle duplication. So they would probably close the fd in one process or the other, as appropriate. (I admit I haven't tested what happens if both forks try to use the file descriptor, but I can imagine it might be useful if they coordinate carefully).
Now, if many forks end up sharing the uffd file descriptor and also end up with copy-on-write shared pages in the MAP_PRIVATE region, the above definitions of the continue and copy would continue to make sense - to me anyway.
Hope this helps
next prev parent reply other threads:[~2025-09-15 22:58 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-15 20:13 David P. Reed
2025-09-15 20:24 ` James Houghton
2025-09-15 22:58 ` David P. Reed [this message]
2025-09-16 0:31 ` James Houghton
2025-09-16 14:48 ` Peter Xu
2025-09-16 15:52 ` David P. Reed
2025-09-16 16:13 ` Peter Xu
2025-09-16 17:09 ` David P. Reed
2025-09-26 22:16 ` Peter Xu
2025-09-16 17:27 ` David P. Reed
2025-09-16 18:35 ` Axel Rasmussen
2025-09-16 19:10 ` James Houghton
2025-09-16 19:47 ` David P. Reed
2025-09-16 22:04 ` Axel Rasmussen
2025-09-26 22:00 ` Peter Xu
2025-09-16 19:52 ` David P. Reed
2025-09-17 16:13 ` Axel Rasmussen
2025-09-19 18:29 ` David P. Reed
2025-09-25 19:20 ` Axel Rasmussen
2025-09-27 18:45 ` David P. Reed
2025-09-29 5:30 ` James Houghton
2025-09-29 19:44 ` David P. Reed
2025-09-29 20:30 ` Peter Xu
2025-10-01 22:16 ` Axel Rasmussen
2025-10-17 21:07 ` David P. Reed
2025-09-16 15:37 ` David P. Reed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1757977128.137610687@apps.rackspace.com \
--to=dpreed@deepplum.com \
--cc=akpm@linux-foundation.org \
--cc=axelrasmussen@google.com \
--cc=jthoughton@google.com \
--cc=linux-mm@kvack.org \
--cc=peterx@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox