From: "David P. Reed" <dpreed@deepplum.com>
To: "Axel Rasmussen" <axelrasmussen@google.com>
Cc: "Peter Xu" <peterx@redhat.com>,
"James Houghton" <jthoughton@google.com>,
"Andrew Morton" <akpm@linux-foundation.org>,
linux-mm@kvack.org
Subject: Re: PROBLEM: userfaultfd REGISTER minor mode on MAP_PRIVATE range fails
Date: Fri, 19 Sep 2025 14:29:20 -0400 (EDT) [thread overview]
Message-ID: <1758306560.96630670@apps.rackspace.com> (raw)
In-Reply-To: <CAJHvVchHKxiVKFjUz4ir4PVDvUihLhiSRMBWqpMEZfwLdereuA@mail.gmail.com>
On Wednesday, September 17, 2025 12:13, "Axel Rasmussen" <axelrasmussen@google.com> said:
> On Tue, Sep 16, 2025 at 12:52 PM David P. Reed <dpreed@deepplum.com> wrote:
>>
>>
>>
>> On Tuesday, September 16, 2025 14:35, "Axel Rasmussen" <axelrasmussen@google.com>
>> said:
>>
>> > On Tue, Sep 16, 2025 at 10:27 AM David P. Reed <dpreed@deepplum.com>
>> wrote:
>> >
>> >> Than -
>> >>
>> >> Just to clarify -
>> >> Looking at the man page for UFFDIO_API, there are two "feature bits" that
>> >> indicate cases where "minor" handling is now supported, and can be enabled.
>> >> UFFD_FEATURE_MINOR_HUGETLBFS and UFFD_FEATURE_MINOR_SHMEM
>> >> In my reading of the documents, these seem to imply that before they were
>> >> added as new features, that MAP_PRIVATE|MAP_ANONYMOUS mappings were
>> >> supported, and that the "new" additions to the MINOR mode were just for
>> >> HUGETLBFS and MAP_SHARED cases.
>> >>
>> >
>> > Actually minor fault support didn't exist at all before those two features
>> > were added. :)
>>
>> Thanks for commenting. I'm not sure that's exactly true. Why is SNMEM
>> (MAP_SHARED) supported, but not ordinary pages? I wasn't party to the evolution
>> here, but so far no one has explained why there's a special difference between
>> SHMEM and ordinary VMAs.
>
> I promise it's true, I wrote the UFFD minor fault handling feature. :)
OK, but I am still confused as to SHMEM VMAs are supported and non-SHMEM are not, in the case of an anonymous mapped range.
>
> As for why... Like I said above, UFFD calls it a "minor" fault if the
> PTE doesn't exist, but the page already exists in the page cache. If
> the PTE does exist, you won't get either a minor *or* a missing fault.
> If the page does not already existing the page cache, you'll get a
> missing fault, not a minor fault.
I'm assuming that you understand there is a profound difference between the "page cache" and the "swap cache" in Linux. I am referring to what happens when a page is in the swap cache, (which is primarily about anaonymous pages, but a weird corner case is that "tmpfs" is backed by the swap cache and the swap system, not by the page cache).
The "historical reasons" for the swap cache not being the page cache weirdly difficult to decode - I've spent a chunk of months trying to do historical reasearch on how this came about, but more importantly, why. No luck on the why. (And the main reason seems to be that, if I were to guess, that the folks who built it wanted to avoid using "inodes", which are required by the whole page cache meechanism, perhaps because they thought inodes were "expensive").
Anyway, I'm now understanding that UFFD's chosen a variant meaning of "minor page fault" that seems tied to pages that are file backed or SHMEM.
A "swapped" page is anonymous by definition of what "swap" means in Linux. In Unix and other systems, swapping was a generic term that included file-backed paging as well as non-file-backed pages.
Anyway, I'm quite puzzled why I can't seem to monitor MAP_PRIVATE|MAP_ANONYMOUS page faults with userfaultfd. The reason I focus on CoW is that CoW and fork() behavior is basically the only user visible difference between MAP_PRIVATE and MAP_SHARED. And if you read random examples of how to use mmap(), quite often MAP_PRIVATE is suggested as if it were the "normal" usage (despite what happens on fork()).
>
> So "ordinary" VMAs are not supported because I don't think there is
> any way to create that condition with them? If you just
> mmap(MAP_ANON|MAP_PRIVATE), those pages will never be in the page
> cache, right? How would you go about doing so? You don't have an fd,
> you can't fallocate it. If you specified MAP_POPULATE, the PTEs would
> also be installed, so you just wouldn't get userfaults at all. If you
> create the mapping, then fork, then write to it in the child, I think
> the pages just get CoWed, I don't think userfaults are generated for
> that, because the PTE was already there (albeit, with RO permissions).
>
> I guess maybe a way to make progress here is, can you list out what
> sequence of steps you believe should result in a UFFD minor fault?
> Like (for example):
>
> fd = memfd_create()
> fallocate(fd, 0, 0, size)
> mmap(fd, MAP_PRIVATE)
> /* register mapping for UFFD minor faults */
> /* read or write to mapping */
>
> Now we get a minor fault.
>
>
>
>>
>> >
>> > You are right that userfaultfd's use of "minor fault" is (unfortunately)
>> > slightly different from the meaning in other contexts. I think the more
>> > normal meaning is, faults which do not incur I/O (i.e., swap faults and
>> > file faults [i.e., faults on non-swap-backed pages] are major, other faults
>> > are minor).
>> >
>> > For userfaultfd, a minor fault is a fault where the page already exists in
>> > the page cache, but the page table entry wasn't setup. I don't think that
>> > scenario can ever happen for anonymous, private mappings, so it doesn't
>> > really make sense to be able to register such mappings in this mode. If you
>> > create a mapping with mmap(MAP_ANON|MAP_PRIVATE) and then access it (read
>> > or write), that fault requires allocation of a new page, so userfaultfd
>> > does not consider that a "minor fault". My recollection though is if you
>> > make a file on tmpfs or hugetlbfs, fallocate() it or whatever, and you
>> > MAP_PRIVATE that file, *that* registration will work.
>> >
>> >
>> >>
>> >> It seems odd that anonymous page faults and COW would not be handled,
>> >> given that context.
>> >>
>> >> Anyway, that's unclear in any of the documentation. This just adds to my
>> >> last response where I explain my use case.
>> >>
>> >>
>> >>
>> >
>>
>>
>
next prev parent reply other threads:[~2025-09-19 18:29 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-15 20:13 David P. Reed
2025-09-15 20:24 ` James Houghton
2025-09-15 22:58 ` David P. Reed
2025-09-16 0:31 ` James Houghton
2025-09-16 14:48 ` Peter Xu
2025-09-16 15:52 ` David P. Reed
2025-09-16 16:13 ` Peter Xu
2025-09-16 17:09 ` David P. Reed
2025-09-26 22:16 ` Peter Xu
2025-09-16 17:27 ` David P. Reed
2025-09-16 18:35 ` Axel Rasmussen
2025-09-16 19:10 ` James Houghton
2025-09-16 19:47 ` David P. Reed
2025-09-16 22:04 ` Axel Rasmussen
2025-09-26 22:00 ` Peter Xu
2025-09-16 19:52 ` David P. Reed
2025-09-17 16:13 ` Axel Rasmussen
2025-09-19 18:29 ` David P. Reed [this message]
2025-09-25 19:20 ` Axel Rasmussen
2025-09-27 18:45 ` David P. Reed
2025-09-29 5:30 ` James Houghton
2025-09-29 19:44 ` David P. Reed
2025-09-29 20:30 ` Peter Xu
2025-10-01 22:16 ` Axel Rasmussen
2025-10-17 21:07 ` David P. Reed
2025-09-16 15:37 ` David P. Reed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1758306560.96630670@apps.rackspace.com \
--to=dpreed@deepplum.com \
--cc=akpm@linux-foundation.org \
--cc=axelrasmussen@google.com \
--cc=jthoughton@google.com \
--cc=linux-mm@kvack.org \
--cc=peterx@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox