From: Axel Rasmussen <axelrasmussen@google.com>
To: "David P. Reed" <dpreed@deepplum.com>
Cc: Peter Xu <peterx@redhat.com>,
James Houghton <jthoughton@google.com>,
Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org
Subject: Re: PROBLEM: userfaultfd REGISTER minor mode on MAP_PRIVATE range fails
Date: Wed, 17 Sep 2025 09:13:14 -0700 [thread overview]
Message-ID: <CAJHvVchHKxiVKFjUz4ir4PVDvUihLhiSRMBWqpMEZfwLdereuA@mail.gmail.com> (raw)
In-Reply-To: <1758052343.971831541@apps.rackspace.com>
On Tue, Sep 16, 2025 at 12:52 PM David P. Reed <dpreed@deepplum.com> wrote:
>
>
>
> On Tuesday, September 16, 2025 14:35, "Axel Rasmussen" <axelrasmussen@google.com> said:
>
> > On Tue, Sep 16, 2025 at 10:27 AM David P. Reed <dpreed@deepplum.com> wrote:
> >
> >> Than -
> >>
> >> Just to clarify -
> >> Looking at the man page for UFFDIO_API, there are two "feature bits" that
> >> indicate cases where "minor" handling is now supported, and can be enabled.
> >> UFFD_FEATURE_MINOR_HUGETLBFS and UFFD_FEATURE_MINOR_SHMEM
> >> In my reading of the documents, these seem to imply that before they were
> >> added as new features, that MAP_PRIVATE|MAP_ANONYMOUS mappings were
> >> supported, and that the "new" additions to the MINOR mode were just for
> >> HUGETLBFS and MAP_SHARED cases.
> >>
> >
> > Actually minor fault support didn't exist at all before those two features
> > were added. :)
>
> Thanks for commenting. I'm not sure that's exactly true. Why is SNMEM (MAP_SHARED) supported, but not ordinary pages? I wasn't party to the evolution here, but so far no one has explained why there's a special difference between SHMEM and ordinary VMAs.
I promise it's true, I wrote the UFFD minor fault handling feature. :)
As for why... Like I said above, UFFD calls it a "minor" fault if the
PTE doesn't exist, but the page already exists in the page cache. If
the PTE does exist, you won't get either a minor *or* a missing fault.
If the page does not already existing the page cache, you'll get a
missing fault, not a minor fault.
So "ordinary" VMAs are not supported because I don't think there is
any way to create that condition with them? If you just
mmap(MAP_ANON|MAP_PRIVATE), those pages will never be in the page
cache, right? How would you go about doing so? You don't have an fd,
you can't fallocate it. If you specified MAP_POPULATE, the PTEs would
also be installed, so you just wouldn't get userfaults at all. If you
create the mapping, then fork, then write to it in the child, I think
the pages just get CoWed, I don't think userfaults are generated for
that, because the PTE was already there (albeit, with RO permissions).
I guess maybe a way to make progress here is, can you list out what
sequence of steps you believe should result in a UFFD minor fault?
Like (for example):
fd = memfd_create()
fallocate(fd, 0, 0, size)
mmap(fd, MAP_PRIVATE)
/* register mapping for UFFD minor faults */
/* read or write to mapping */
Now we get a minor fault.
>
> >
> > You are right that userfaultfd's use of "minor fault" is (unfortunately)
> > slightly different from the meaning in other contexts. I think the more
> > normal meaning is, faults which do not incur I/O (i.e., swap faults and
> > file faults [i.e., faults on non-swap-backed pages] are major, other faults
> > are minor).
> >
> > For userfaultfd, a minor fault is a fault where the page already exists in
> > the page cache, but the page table entry wasn't setup. I don't think that
> > scenario can ever happen for anonymous, private mappings, so it doesn't
> > really make sense to be able to register such mappings in this mode. If you
> > create a mapping with mmap(MAP_ANON|MAP_PRIVATE) and then access it (read
> > or write), that fault requires allocation of a new page, so userfaultfd
> > does not consider that a "minor fault". My recollection though is if you
> > make a file on tmpfs or hugetlbfs, fallocate() it or whatever, and you
> > MAP_PRIVATE that file, *that* registration will work.
> >
> >
> >>
> >> It seems odd that anonymous page faults and COW would not be handled,
> >> given that context.
> >>
> >> Anyway, that's unclear in any of the documentation. This just adds to my
> >> last response where I explain my use case.
> >>
> >>
> >>
> >
>
>
next prev parent reply other threads:[~2025-09-17 16:13 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-15 20:13 David P. Reed
2025-09-15 20:24 ` James Houghton
2025-09-15 22:58 ` David P. Reed
2025-09-16 0:31 ` James Houghton
2025-09-16 14:48 ` Peter Xu
2025-09-16 15:52 ` David P. Reed
2025-09-16 16:13 ` Peter Xu
2025-09-16 17:09 ` David P. Reed
2025-09-26 22:16 ` Peter Xu
2025-09-16 17:27 ` David P. Reed
2025-09-16 18:35 ` Axel Rasmussen
2025-09-16 19:10 ` James Houghton
2025-09-16 19:47 ` David P. Reed
2025-09-16 22:04 ` Axel Rasmussen
2025-09-26 22:00 ` Peter Xu
2025-09-16 19:52 ` David P. Reed
2025-09-17 16:13 ` Axel Rasmussen [this message]
2025-09-19 18:29 ` David P. Reed
2025-09-25 19:20 ` Axel Rasmussen
2025-09-27 18:45 ` David P. Reed
2025-09-29 5:30 ` James Houghton
2025-09-29 19:44 ` David P. Reed
2025-09-29 20:30 ` Peter Xu
2025-10-01 22:16 ` Axel Rasmussen
2025-10-17 21:07 ` David P. Reed
2025-09-16 15:37 ` David P. Reed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAJHvVchHKxiVKFjUz4ir4PVDvUihLhiSRMBWqpMEZfwLdereuA@mail.gmail.com \
--to=axelrasmussen@google.com \
--cc=akpm@linux-foundation.org \
--cc=dpreed@deepplum.com \
--cc=jthoughton@google.com \
--cc=linux-mm@kvack.org \
--cc=peterx@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox