* [PATCH 0/2] mm/userfaultfd: Fix uninitialized output field for -EAGAIN race
@ 2025-04-24 21:57 Peter Xu
2025-04-24 21:57 ` [PATCH 1/2] " Peter Xu
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Peter Xu @ 2025-04-24 21:57 UTC (permalink / raw)
To: linux-kernel, linux-mm
Cc: Mike Rapoport, James Houghton, David Hildenbrand,
Suren Baghdasaryan, Axel Rasmussen, Andrew Morton, peterx
When discussing some userfaultfd issues with Andrea, Andrea pointed out an
ABI issue with userfaultfd that existed for years. Luckily the issue
should only be a very corner case one, and the fix (even if changing the
kernel ABI) should only be in the good way, IOW there should have no risk
breaking any userapp but only fixing.
This issue also should not matter if the userapp didn't enable any of the
UFFD_FEATURE_EVENT_* feature.
The first patch contains more information on the issue and the fix. The
2nd patch is a test case I added which would fail on old kernels (including
current latest branches) but will start working after patch 1 applied.
Thanks,
Peter Xu (2):
mm/userfaultfd: Fix uninitialized output field for -EAGAIN race
mm/selftests: Add a test to verify mmap_changing race with -EAGAIN
fs/userfaultfd.c | 28 ++-
tools/testing/selftests/mm/uffd-unit-tests.c | 203 +++++++++++++++++++
2 files changed, 225 insertions(+), 6 deletions(-)
--
2.48.1
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 1/2] mm/userfaultfd: Fix uninitialized output field for -EAGAIN race
2025-04-24 21:57 [PATCH 0/2] mm/userfaultfd: Fix uninitialized output field for -EAGAIN race Peter Xu
@ 2025-04-24 21:57 ` Peter Xu
2025-04-25 15:12 ` David Hildenbrand
2025-04-24 21:57 ` [PATCH 2/2] mm/selftests: Add a test to verify mmap_changing race with -EAGAIN Peter Xu
2025-04-25 15:45 ` [PATCH 0/2] mm/userfaultfd: Fix uninitialized output field for -EAGAIN race James Houghton
2 siblings, 1 reply; 10+ messages in thread
From: Peter Xu @ 2025-04-24 21:57 UTC (permalink / raw)
To: linux-kernel, linux-mm
Cc: Mike Rapoport, James Houghton, David Hildenbrand,
Suren Baghdasaryan, Axel Rasmussen, Andrew Morton, peterx,
linux-stable, Andrea Arcangeli
While discussing some userfaultfd relevant issues recently, Andrea noticed
a potential ABI breakage with -EAGAIN on almost all userfaultfd ioctl()s.
Quote from Andrea, explaining how -EAGAIN was processed, and how this
should fix it (taking example of UFFDIO_COPY ioctl):
The "mmap_changing" and "stale pmd" conditions are already reported as
-EAGAIN written in the copy field, this does not change it. This change
removes the subnormal case that left copy.copy uninitialized and required
apps to explicitly set the copy field to get deterministic
behavior (which is a requirement contrary to the documentation in both
the manpage and source code). In turn there's no alteration to backwards
compatibility as result of this change because userland will find the
copy field consistently set to -EAGAIN, and not anymore sometime -EAGAIN
and sometime uninitialized.
Even then the change only can make a difference to non cooperative users
of userfaultfd, so when UFFD_FEATURE_EVENT_* is enabled, which is not
true for the vast majority of apps using userfaultfd or this unintended
uninitialized field may have been noticed sooner.
Meanwhile, since this bug existed for years, it also almost affects all
ioctl()s that was introduced later. Besides UFFDIO_ZEROPAGE, these also
get affected in the same way:
- UFFDIO_CONTINUE
- UFFDIO_POISON
- UFFDIO_MOVE
This patch should have fixed all of them.
Fixes: df2cc96e7701 ("userfaultfd: prevent non-cooperative events vs mcopy_atomic races")
Fixes: f619147104c8 ("userfaultfd: add UFFDIO_CONTINUE ioctl")
Fixes: fc71884a5f59 ("mm: userfaultfd: add new UFFDIO_POISON ioctl")
Fixes: adef440691ba ("userfaultfd: UFFDIO_MOVE uABI")
Cc: linux-stable <stable@vger.kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Reported-by: Andrea Arcangeli <aarcange@redhat.com>
Suggested-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
fs/userfaultfd.c | 28 ++++++++++++++++++++++------
1 file changed, 22 insertions(+), 6 deletions(-)
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index d80f94346199..22f4bf956ba1 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -1585,8 +1585,11 @@ static int userfaultfd_copy(struct userfaultfd_ctx *ctx,
user_uffdio_copy = (struct uffdio_copy __user *) arg;
ret = -EAGAIN;
- if (atomic_read(&ctx->mmap_changing))
+ if (unlikely(atomic_read(&ctx->mmap_changing))) {
+ if (unlikely(put_user(ret, &user_uffdio_copy->copy)))
+ return -EFAULT;
goto out;
+ }
ret = -EFAULT;
if (copy_from_user(&uffdio_copy, user_uffdio_copy,
@@ -1641,8 +1644,11 @@ static int userfaultfd_zeropage(struct userfaultfd_ctx *ctx,
user_uffdio_zeropage = (struct uffdio_zeropage __user *) arg;
ret = -EAGAIN;
- if (atomic_read(&ctx->mmap_changing))
+ if (unlikely(atomic_read(&ctx->mmap_changing))) {
+ if (unlikely(put_user(ret, &user_uffdio_zeropage->zeropage)))
+ return -EFAULT;
goto out;
+ }
ret = -EFAULT;
if (copy_from_user(&uffdio_zeropage, user_uffdio_zeropage,
@@ -1744,8 +1750,11 @@ static int userfaultfd_continue(struct userfaultfd_ctx *ctx, unsigned long arg)
user_uffdio_continue = (struct uffdio_continue __user *)arg;
ret = -EAGAIN;
- if (atomic_read(&ctx->mmap_changing))
+ if (unlikely(atomic_read(&ctx->mmap_changing))) {
+ if (unlikely(put_user(ret, &user_uffdio_continue->mapped)))
+ return -EFAULT;
goto out;
+ }
ret = -EFAULT;
if (copy_from_user(&uffdio_continue, user_uffdio_continue,
@@ -1801,8 +1810,11 @@ static inline int userfaultfd_poison(struct userfaultfd_ctx *ctx, unsigned long
user_uffdio_poison = (struct uffdio_poison __user *)arg;
ret = -EAGAIN;
- if (atomic_read(&ctx->mmap_changing))
+ if (unlikely(atomic_read(&ctx->mmap_changing))) {
+ if (unlikely(put_user(ret, &user_uffdio_poison->updated)))
+ return -EFAULT;
goto out;
+ }
ret = -EFAULT;
if (copy_from_user(&uffdio_poison, user_uffdio_poison,
@@ -1870,8 +1882,12 @@ static int userfaultfd_move(struct userfaultfd_ctx *ctx,
user_uffdio_move = (struct uffdio_move __user *) arg;
- if (atomic_read(&ctx->mmap_changing))
- return -EAGAIN;
+ ret = -EAGAIN;
+ if (unlikely(atomic_read(&ctx->mmap_changing))) {
+ if (unlikely(put_user(ret, &user_uffdio_move->move)))
+ return -EFAULT;
+ goto out;
+ }
if (copy_from_user(&uffdio_move, user_uffdio_move,
/* don't copy "move" last field */
--
2.48.1
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 2/2] mm/selftests: Add a test to verify mmap_changing race with -EAGAIN
2025-04-24 21:57 [PATCH 0/2] mm/userfaultfd: Fix uninitialized output field for -EAGAIN race Peter Xu
2025-04-24 21:57 ` [PATCH 1/2] " Peter Xu
@ 2025-04-24 21:57 ` Peter Xu
2025-04-25 15:45 ` [PATCH 0/2] mm/userfaultfd: Fix uninitialized output field for -EAGAIN race James Houghton
2 siblings, 0 replies; 10+ messages in thread
From: Peter Xu @ 2025-04-24 21:57 UTC (permalink / raw)
To: linux-kernel, linux-mm
Cc: Mike Rapoport, James Houghton, David Hildenbrand,
Suren Baghdasaryan, Axel Rasmussen, Andrew Morton, peterx
Add an unit test to verify the recent mmap_changing ABI breakage.
Note that I used some tricks here and there to make the test simple, e.g. I
abused UFFDIO_MOVE on top of shmem with the fact that I know what I want to
test will be even earlier than the vma type check. Rich comments were
added to explain trivial details.
Before that fix, -EAGAIN would have been written to the copy field most of
the time but not always; the test should be able to reliably trigger the
outlier case. After the fix, it's written always, the test verifies that
making sure corresponding field (e.g. copy.copy for UFFDIO_COPY) is updated.
Signed-off-by: Peter Xu <peterx@redhat.com>
---
tools/testing/selftests/mm/uffd-unit-tests.c | 203 +++++++++++++++++++
1 file changed, 203 insertions(+)
diff --git a/tools/testing/selftests/mm/uffd-unit-tests.c b/tools/testing/selftests/mm/uffd-unit-tests.c
index e8fd9011c2a3..e3b5046d02d8 100644
--- a/tools/testing/selftests/mm/uffd-unit-tests.c
+++ b/tools/testing/selftests/mm/uffd-unit-tests.c
@@ -1231,6 +1231,183 @@ static void uffd_move_pmd_split_test(uffd_test_args_t *targs)
uffd_move_pmd_handle_fault);
}
+static bool
+uffdio_verify_results(const char *name, int ret, int error, long result)
+{
+ /*
+ * Should always return -1 with errno=EAGAIN, with corresponding
+ * result field updated in ioctl() args to be -EAGAIN too
+ * (e.g. copy.copy field for UFFDIO_COPY).
+ */
+ if (ret != -1) {
+ uffd_test_fail("%s should have returned -1", name);
+ return false;
+ }
+
+ if (error != EAGAIN) {
+ uffd_test_fail("%s should have errno==EAGAIN", name);
+ return false;
+ }
+
+ if (result != -EAGAIN) {
+ uffd_test_fail("%s should have been updated for -EAGAIN",
+ name);
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * This defines a function to test one ioctl. Note that here "field" can
+ * be 1 or anything not -EAGAIN. With that initial value set, we can
+ * verify later that it should be updated by kernel (when -EAGAIN
+ * returned), by checking whether it is also updated to -EAGAIN.
+ */
+#define DEFINE_MMAP_CHANGING_TEST(name, ioctl_name, field) \
+ static bool uffdio_mmap_changing_test_##name(int fd) \
+ { \
+ int ret; \
+ struct uffdio_##name args = { \
+ .field = 1, \
+ }; \
+ ret = ioctl(fd, ioctl_name, &args); \
+ return uffdio_verify_results(#ioctl_name, ret, errno, args.field); \
+ }
+
+DEFINE_MMAP_CHANGING_TEST(zeropage, UFFDIO_ZEROPAGE, zeropage)
+DEFINE_MMAP_CHANGING_TEST(copy, UFFDIO_COPY, copy)
+DEFINE_MMAP_CHANGING_TEST(move, UFFDIO_MOVE, move)
+DEFINE_MMAP_CHANGING_TEST(poison, UFFDIO_POISON, updated)
+DEFINE_MMAP_CHANGING_TEST(continue, UFFDIO_CONTINUE, mapped)
+
+typedef enum {
+ /* We actually do not care about any state except UNINTERRUPTIBLE.. */
+ THR_STATE_UNKNOWN = 0,
+ THR_STATE_UNINTERRUPTIBLE,
+} thread_state;
+
+static void sleep_short(void)
+{
+ usleep(1000);
+}
+
+static thread_state thread_state_get(pid_t tid)
+{
+ const char *header = "State:\t";
+ char tmp[256], *p, c;
+ FILE *fp;
+
+ snprintf(tmp, sizeof(tmp), "/proc/%d/status", tid);
+ fp = fopen(tmp, "r");
+
+ if (!fp) {
+ return THR_STATE_UNKNOWN;
+ }
+
+ while (fgets(tmp, sizeof(tmp), fp)) {
+ p = strstr(tmp, header);
+ if (p) {
+ /* For example, "State:\tD (disk sleep)" */
+ c = *(p + sizeof(header) - 1);
+ return c == 'D' ?
+ THR_STATE_UNINTERRUPTIBLE : THR_STATE_UNKNOWN;
+ }
+ }
+
+ return THR_STATE_UNKNOWN;
+}
+
+static void thread_state_until(pid_t tid, thread_state state)
+{
+ thread_state s;
+
+ do {
+ s = thread_state_get(tid);
+ sleep_short();
+ } while (s != state);
+}
+
+static void *uffd_mmap_changing_thread(void *opaque)
+{
+ volatile pid_t *pid = opaque;
+ int ret;
+
+ /* Unfortunately, it's only fetch-able from the thread itself.. */
+ assert(*pid == 0);
+ *pid = syscall(SYS_gettid);
+
+ /* Inject an event, this will hang solid until the event read */
+ ret = madvise(area_dst, page_size, MADV_REMOVE);
+ if (ret)
+ err("madvise(MADV_REMOVE) failed");
+
+ return NULL;
+}
+
+static void uffd_consume_message(int fd)
+{
+ struct uffd_msg msg = { 0 };
+
+ while (uffd_read_msg(fd, &msg));
+}
+
+static void uffd_mmap_changing_test(uffd_test_args_t *targs)
+{
+ /*
+ * This stores the real PID (which can be different from how tid is
+ * defined..) for the child thread, 0 means not initialized.
+ */
+ pid_t pid = 0;
+ pthread_t tid;
+ int ret;
+
+ if (uffd_register(uffd, area_dst, nr_pages * page_size,
+ true, false, false))
+ err("uffd_register() failed");
+
+ /* Create a thread to generate the racy event */
+ ret = pthread_create(&tid, NULL, uffd_mmap_changing_thread, &pid);
+ if (ret)
+ err("pthread_create() failed");
+
+ /*
+ * Wait until the thread setup the pid. Use volatile to make sure
+ * it reads from RAM not regs.
+ */
+ while (!(volatile pid_t)pid)
+ sleep_short();
+
+ /* Wait until the thread hangs at REMOVE event */
+ thread_state_until(pid, THR_STATE_UNINTERRUPTIBLE);
+
+ if (!uffdio_mmap_changing_test_copy(uffd))
+ return;
+
+ if (!uffdio_mmap_changing_test_zeropage(uffd))
+ return;
+
+ if (!uffdio_mmap_changing_test_move(uffd))
+ return;
+
+ if (!uffdio_mmap_changing_test_poison(uffd))
+ return;
+
+ if (!uffdio_mmap_changing_test_continue(uffd))
+ return;
+
+ /*
+ * All succeeded above! Recycle everything. Start by reading the
+ * event so as to kick the thread roll again..
+ */
+ uffd_consume_message(uffd);
+
+ ret = pthread_join(tid, NULL);
+ assert(ret == 0);
+
+ uffd_test_pass();
+}
+
static int prevent_hugepages(const char **errmsg)
{
/* This should be done before source area is populated */
@@ -1470,6 +1647,32 @@ uffd_test_case_t uffd_tests[] = {
.mem_targets = MEM_ALL,
.uffd_feature_required = UFFD_FEATURE_POISON,
},
+ {
+ .name = "mmap-changing",
+ .uffd_fn = uffd_mmap_changing_test,
+ /*
+ * There's no point running this test over all mem types as
+ * they share the same code paths.
+ *
+ * Choose shmem for simplicity, because (1) shmem supports
+ * MINOR mode to cover UFFDIO_CONTINUE, and (2) shmem is
+ * almost always available (unlike hugetlb). Here we
+ * abused SHMEM for UFFDIO_MOVE, but the test we want to
+ * cover doesn't yet need the correct memory type..
+ */
+ .mem_targets = MEM_SHMEM,
+ /*
+ * Any UFFD_FEATURE_EVENT_* should work to trigger the
+ * race logically, but choose the simplest (REMOVE).
+ *
+ * Meanwhile, since we'll cover quite a few new ioctl()s
+ * (CONTINUE, POISON, MOVE), skip this test for old kernels
+ * by choosing all of them.
+ */
+ .uffd_feature_required = UFFD_FEATURE_EVENT_REMOVE |
+ UFFD_FEATURE_MOVE | UFFD_FEATURE_POISON |
+ UFFD_FEATURE_MINOR_SHMEM,
+ },
};
static void usage(const char *prog)
--
2.48.1
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] mm/userfaultfd: Fix uninitialized output field for -EAGAIN race
2025-04-24 21:57 ` [PATCH 1/2] " Peter Xu
@ 2025-04-25 15:12 ` David Hildenbrand
2025-04-25 15:54 ` James Houghton
0 siblings, 1 reply; 10+ messages in thread
From: David Hildenbrand @ 2025-04-25 15:12 UTC (permalink / raw)
To: Peter Xu, linux-kernel, linux-mm
Cc: Mike Rapoport, James Houghton, Suren Baghdasaryan,
Axel Rasmussen, Andrew Morton, linux-stable, Andrea Arcangeli
On 24.04.25 23:57, Peter Xu wrote:
> While discussing some userfaultfd relevant issues recently, Andrea noticed
> a potential ABI breakage with -EAGAIN on almost all userfaultfd ioctl()s.
I guess we talk about e.g., "man UFFDIO_COPY" documentation:
"The copy field is used by the kernel to return the number of bytes that
was actually copied, or an error (a negated errno-style value). The
copy field is output-only; it is not read by the UFFDIO_COPY operation."
I assume -EINVAL/-ESRCH/-EFAULT are excluded from that rule, because
there is no sense in user-space trying again on these errors either way.
Well, there are cases where we would store -EFAULT, when we receive it
from mfill_atomic_copy().
So if we store -EAGAIN to copy.copy it says "we didn't copy anything".
(probably just storing 0 would have been better, but I am sure there was
a reason to indicate negative errors in addition to returning an error)
>
> Quote from Andrea, explaining how -EAGAIN was processed, and how this
> should fix it (taking example of UFFDIO_COPY ioctl):
>
> The "mmap_changing" and "stale pmd" conditions are already reported as
> -EAGAIN written in the copy field, this does not change it. This change
> removes the subnormal case that left copy.copy uninitialized and required
> apps to explicitly set the copy field to get deterministic
> behavior (which is a requirement contrary to the documentation in both
> the manpage and source code). In turn there's no alteration to backwards
> compatibility as result of this change because userland will find the
> copy field consistently set to -EAGAIN, and not anymore sometime -EAGAIN
> and sometime uninitialized.
>
> Even then the change only can make a difference to non cooperative users
> of userfaultfd, so when UFFD_FEATURE_EVENT_* is enabled, which is not
> true for the vast majority of apps using userfaultfd or this unintended
> uninitialized field may have been noticed sooner.
>
> Meanwhile, since this bug existed for years, it also almost affects all
> ioctl()s that was introduced later. Besides UFFDIO_ZEROPAGE, these also
> get affected in the same way:
>
> - UFFDIO_CONTINUE
> - UFFDIO_POISON
> - UFFDIO_MOVE
>
> This patch should have fixed all of them.
>
> Fixes: df2cc96e7701 ("userfaultfd: prevent non-cooperative events vs mcopy_atomic races")
> Fixes: f619147104c8 ("userfaultfd: add UFFDIO_CONTINUE ioctl")
> Fixes: fc71884a5f59 ("mm: userfaultfd: add new UFFDIO_POISON ioctl")
> Fixes: adef440691ba ("userfaultfd: UFFDIO_MOVE uABI")
> Cc: linux-stable <stable@vger.kernel.org>
> Cc: Mike Rapoport <rppt@kernel.org>
> Cc: Axel Rasmussen <axelrasmussen@google.com>
> Cc: Suren Baghdasaryan <surenb@google.com>
> Reported-by: Andrea Arcangeli <aarcange@redhat.com>
> Suggested-by: Andrea Arcangeli <aarcange@redhat.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
> fs/userfaultfd.c | 28 ++++++++++++++++++++++------
> 1 file changed, 22 insertions(+), 6 deletions(-)
>
> diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> index d80f94346199..22f4bf956ba1 100644
> --- a/fs/userfaultfd.c
> +++ b/fs/userfaultfd.c
> @@ -1585,8 +1585,11 @@ static int userfaultfd_copy(struct userfaultfd_ctx *ctx,
> user_uffdio_copy = (struct uffdio_copy __user *) arg;
>
> ret = -EAGAIN;
> - if (atomic_read(&ctx->mmap_changing))
> + if (unlikely(atomic_read(&ctx->mmap_changing))) {
> + if (unlikely(put_user(ret, &user_uffdio_copy->copy)))
> + return -EFAULT;
> goto out;
> + }
Nit: It's weird that we do "return -EFAULT" in one case, in the other we
do "goto out;" which ends up doing a "return ret" ...
Maybe to keep it consistent:
ret = -EAGAIN;
if (unlikely(atomic_read(&ctx->mmap_changing))) {
if (unlikely(put_user(ret, &user_uffdio_copy->copy)))
ret = -EFAULT;
goto out;
}
In all of these functions, we should probably just get rid of the "goto
out" and just return directly. We have a weird mixture of "goto out;"
and return; ... a different cleanup.
Reviewed-by: David Hildenbrand <david@redhat.com>
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/2] mm/userfaultfd: Fix uninitialized output field for -EAGAIN race
2025-04-24 21:57 [PATCH 0/2] mm/userfaultfd: Fix uninitialized output field for -EAGAIN race Peter Xu
2025-04-24 21:57 ` [PATCH 1/2] " Peter Xu
2025-04-24 21:57 ` [PATCH 2/2] mm/selftests: Add a test to verify mmap_changing race with -EAGAIN Peter Xu
@ 2025-04-25 15:45 ` James Houghton
2025-04-25 15:58 ` David Hildenbrand
2 siblings, 1 reply; 10+ messages in thread
From: James Houghton @ 2025-04-25 15:45 UTC (permalink / raw)
To: Peter Xu
Cc: linux-kernel, linux-mm, Mike Rapoport, David Hildenbrand,
Suren Baghdasaryan, Axel Rasmussen, Andrew Morton
On Thu, Apr 24, 2025 at 5:57 PM Peter Xu <peterx@redhat.com> wrote:
>
> When discussing some userfaultfd issues with Andrea, Andrea pointed out an
> ABI issue with userfaultfd that existed for years. Luckily the issue
> should only be a very corner case one, and the fix (even if changing the
> kernel ABI) should only be in the good way, IOW there should have no risk
> breaking any userapp but only fixing.
FWIW, my userspace basically looks like this:
struct uffdio_continue uffdio_continue;
int64_t target_len = /* whatever */;
int64_t bytes_mapped = 0;
int ioctl_ret;
do {
uffdio_continue.range = /* whatever */;
uffdio_continue.mapped = 0;
ioctl_ret = ioctl(uffd, UFFDIO_CONTINUE, &uffdio_continue);
if (uffdio_continue.mapped < 0) { break; }
bytes_mapped += uffdio_continue.mapped;
} while (bytes_mapped < target_len && errno == EAGAIN);
I think your patch would indeed break this. (Perhaps I shouldn't be
reading from `mapped` without first checking that errno == EAGAIN.)
Well, that's what I would say, except in practice I never actually hit
the mmap_changing case while invoking UFFDIO_CONTINUE. :)
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] mm/userfaultfd: Fix uninitialized output field for -EAGAIN race
2025-04-25 15:12 ` David Hildenbrand
@ 2025-04-25 15:54 ` James Houghton
2025-04-25 16:27 ` Peter Xu
0 siblings, 1 reply; 10+ messages in thread
From: James Houghton @ 2025-04-25 15:54 UTC (permalink / raw)
To: David Hildenbrand
Cc: Peter Xu, linux-kernel, linux-mm, Mike Rapoport,
Suren Baghdasaryan, Axel Rasmussen, Andrew Morton, linux-stable,
Andrea Arcangeli
On Fri, Apr 25, 2025 at 11:12 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 24.04.25 23:57, Peter Xu wrote:
> > While discussing some userfaultfd relevant issues recently, Andrea noticed
> > a potential ABI breakage with -EAGAIN on almost all userfaultfd ioctl()s.
>
> I guess we talk about e.g., "man UFFDIO_COPY" documentation:
>
> "The copy field is used by the kernel to return the number of bytes that
> was actually copied, or an error (a negated errno-style value). The
> copy field is output-only; it is not read by the UFFDIO_COPY operation."
>
> I assume -EINVAL/-ESRCH/-EFAULT are excluded from that rule, because
> there is no sense in user-space trying again on these errors either way.
> Well, there are cases where we would store -EFAULT, when we receive it
> from mfill_atomic_copy().
>
> So if we store -EAGAIN to copy.copy it says "we didn't copy anything".
> (probably just storing 0 would have been better, but I am sure there was
> a reason to indicate negative errors in addition to returning an error)
IMHO, it makes more sense to store 0 than -EAGAIN (at least it will
mean that my userspace[1] won't break).
Userspace will need to know from where to restart the ioctl, and if we
put -EAGAIN in `mapped`/`copy`/etc., userspace will need to know that
-EAGAIN actually means 0 anyway.
[1]: https://lore.kernel.org/linux-mm/CADrL8HXuZkX0CP6apHLw0A0Ax4b4a+-=XEt0dH5mAKiN7hBv3w@mail.gmail.com/
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/2] mm/userfaultfd: Fix uninitialized output field for -EAGAIN race
2025-04-25 15:45 ` [PATCH 0/2] mm/userfaultfd: Fix uninitialized output field for -EAGAIN race James Houghton
@ 2025-04-25 15:58 ` David Hildenbrand
2025-04-25 16:07 ` James Houghton
0 siblings, 1 reply; 10+ messages in thread
From: David Hildenbrand @ 2025-04-25 15:58 UTC (permalink / raw)
To: James Houghton, Peter Xu
Cc: linux-kernel, linux-mm, Mike Rapoport, Suren Baghdasaryan,
Axel Rasmussen, Andrew Morton
On 25.04.25 17:45, James Houghton wrote:
> On Thu, Apr 24, 2025 at 5:57 PM Peter Xu <peterx@redhat.com> wrote:
>>
>> When discussing some userfaultfd issues with Andrea, Andrea pointed out an
>> ABI issue with userfaultfd that existed for years. Luckily the issue
>> should only be a very corner case one, and the fix (even if changing the
>> kernel ABI) should only be in the good way, IOW there should have no risk
>> breaking any userapp but only fixing.
>
> FWIW, my userspace basically looks like this:
>
> struct uffdio_continue uffdio_continue;
> int64_t target_len = /* whatever */;
> int64_t bytes_mapped = 0;
> int ioctl_ret;
> do {
> uffdio_continue.range = /* whatever */;
> uffdio_continue.mapped = 0;
> ioctl_ret = ioctl(uffd, UFFDIO_CONTINUE, &uffdio_continue);
> if (uffdio_continue.mapped < 0) { break; }
> bytes_mapped += uffdio_continue.mapped;
> } while (bytes_mapped < target_len && errno == EAGAIN);
>
> I think your patch would indeed break this. (Perhaps I shouldn't be
> reading from `mapped` without first checking that errno == EAGAIN.)
>
> Well, that's what I would say, except in practice I never actually hit
> the mmap_changing case while invoking UFFDIO_CONTINUE. :)
Hm, but what if mfill_atomic_continue() would already return -EAGAIN
when checking mmap_changing etc?
Wouldn't code already run into an issue there?
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/2] mm/userfaultfd: Fix uninitialized output field for -EAGAIN race
2025-04-25 15:58 ` David Hildenbrand
@ 2025-04-25 16:07 ` James Houghton
2025-04-25 16:18 ` Peter Xu
0 siblings, 1 reply; 10+ messages in thread
From: James Houghton @ 2025-04-25 16:07 UTC (permalink / raw)
To: David Hildenbrand
Cc: Peter Xu, linux-kernel, linux-mm, Mike Rapoport,
Suren Baghdasaryan, Axel Rasmussen, Andrew Morton
On Fri, Apr 25, 2025 at 11:58 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 25.04.25 17:45, James Houghton wrote:
> > On Thu, Apr 24, 2025 at 5:57 PM Peter Xu <peterx@redhat.com> wrote:
> >>
> >> When discussing some userfaultfd issues with Andrea, Andrea pointed out an
> >> ABI issue with userfaultfd that existed for years. Luckily the issue
> >> should only be a very corner case one, and the fix (even if changing the
> >> kernel ABI) should only be in the good way, IOW there should have no risk
> >> breaking any userapp but only fixing.
> >
> > FWIW, my userspace basically looks like this:
> >
> > struct uffdio_continue uffdio_continue;
> > int64_t target_len = /* whatever */;
> > int64_t bytes_mapped = 0;
> > int ioctl_ret;
> > do {
> > uffdio_continue.range = /* whatever */;
> > uffdio_continue.mapped = 0;
> > ioctl_ret = ioctl(uffd, UFFDIO_CONTINUE, &uffdio_continue);
> > if (uffdio_continue.mapped < 0) { break; }
> > bytes_mapped += uffdio_continue.mapped;
> > } while (bytes_mapped < target_len && errno == EAGAIN);
> >
> > I think your patch would indeed break this. (Perhaps I shouldn't be
> > reading from `mapped` without first checking that errno == EAGAIN.)
> >
> > Well, that's what I would say, except in practice I never actually hit
> > the mmap_changing case while invoking UFFDIO_CONTINUE. :)
>
> Hm, but what if mfill_atomic_continue() would already return -EAGAIN
> when checking mmap_changing etc?
>
> Wouldn't code already run into an issue there?
Ah, thanks David. You're right, my code is already broken! :(
So given that we already have a case where -EAGAIN is put in the
output field, I change my mind, let's keep putting -EAGAIN in the
output field, and I'll go fix my code.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/2] mm/userfaultfd: Fix uninitialized output field for -EAGAIN race
2025-04-25 16:07 ` James Houghton
@ 2025-04-25 16:18 ` Peter Xu
0 siblings, 0 replies; 10+ messages in thread
From: Peter Xu @ 2025-04-25 16:18 UTC (permalink / raw)
To: James Houghton
Cc: David Hildenbrand, linux-kernel, linux-mm, Mike Rapoport,
Suren Baghdasaryan, Axel Rasmussen, Andrew Morton
On Fri, Apr 25, 2025 at 12:07:31PM -0400, James Houghton wrote:
> On Fri, Apr 25, 2025 at 11:58 AM David Hildenbrand <david@redhat.com> wrote:
> >
> > On 25.04.25 17:45, James Houghton wrote:
> > > On Thu, Apr 24, 2025 at 5:57 PM Peter Xu <peterx@redhat.com> wrote:
> > >>
> > >> When discussing some userfaultfd issues with Andrea, Andrea pointed out an
> > >> ABI issue with userfaultfd that existed for years. Luckily the issue
> > >> should only be a very corner case one, and the fix (even if changing the
> > >> kernel ABI) should only be in the good way, IOW there should have no risk
> > >> breaking any userapp but only fixing.
> > >
> > > FWIW, my userspace basically looks like this:
> > >
> > > struct uffdio_continue uffdio_continue;
> > > int64_t target_len = /* whatever */;
> > > int64_t bytes_mapped = 0;
> > > int ioctl_ret;
> > > do {
> > > uffdio_continue.range = /* whatever */;
> > > uffdio_continue.mapped = 0;
> > > ioctl_ret = ioctl(uffd, UFFDIO_CONTINUE, &uffdio_continue);
> > > if (uffdio_continue.mapped < 0) { break; }
> > > bytes_mapped += uffdio_continue.mapped;
> > > } while (bytes_mapped < target_len && errno == EAGAIN);
> > >
> > > I think your patch would indeed break this. (Perhaps I shouldn't be
> > > reading from `mapped` without first checking that errno == EAGAIN.)
> > >
> > > Well, that's what I would say, except in practice I never actually hit
> > > the mmap_changing case while invoking UFFDIO_CONTINUE. :)
> >
> > Hm, but what if mfill_atomic_continue() would already return -EAGAIN
> > when checking mmap_changing etc?
> >
> > Wouldn't code already run into an issue there?
>
> Ah, thanks David. You're right, my code is already broken! :(
>
> So given that we already have a case where -EAGAIN is put in the
> output field, I change my mind, let's keep putting -EAGAIN in the
> output field, and I'll go fix my code.
Thanks both for the comments.
AFAIU it shouldn't affect any app that doesn't use UFFD_FEATURE_EVENT_* as
mentioned in cover letter. But I tend to agree a fix is good, that any app
should better check ioctl retval and errno, before anything else..
--
Peter Xu
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] mm/userfaultfd: Fix uninitialized output field for -EAGAIN race
2025-04-25 15:54 ` James Houghton
@ 2025-04-25 16:27 ` Peter Xu
0 siblings, 0 replies; 10+ messages in thread
From: Peter Xu @ 2025-04-25 16:27 UTC (permalink / raw)
To: James Houghton
Cc: David Hildenbrand, linux-kernel, linux-mm, Mike Rapoport,
Suren Baghdasaryan, Axel Rasmussen, Andrew Morton, linux-stable,
Andrea Arcangeli
On Fri, Apr 25, 2025 at 11:54:47AM -0400, James Houghton wrote:
> On Fri, Apr 25, 2025 at 11:12 AM David Hildenbrand <david@redhat.com> wrote:
> >
> > On 24.04.25 23:57, Peter Xu wrote:
> > > While discussing some userfaultfd relevant issues recently, Andrea noticed
> > > a potential ABI breakage with -EAGAIN on almost all userfaultfd ioctl()s.
> >
> > I guess we talk about e.g., "man UFFDIO_COPY" documentation:
> >
> > "The copy field is used by the kernel to return the number of bytes that
> > was actually copied, or an error (a negated errno-style value). The
> > copy field is output-only; it is not read by the UFFDIO_COPY operation."
> >
> > I assume -EINVAL/-ESRCH/-EFAULT are excluded from that rule, because
> > there is no sense in user-space trying again on these errors either way.
> > Well, there are cases where we would store -EFAULT, when we receive it
> > from mfill_atomic_copy().
> >
> > So if we store -EAGAIN to copy.copy it says "we didn't copy anything".
> > (probably just storing 0 would have been better, but I am sure there was
> > a reason to indicate negative errors in addition to returning an error)
>
> IMHO, it makes more sense to store 0 than -EAGAIN (at least it will
> mean that my userspace[1] won't break).
>
> Userspace will need to know from where to restart the ioctl, and if we
> put -EAGAIN in `mapped`/`copy`/etc., userspace will need to know that
> -EAGAIN actually means 0 anyway.
Yes agreed, the API might be easier to follow if the kernel will only
update >=0 values to copy.copy and only if -EAGAIN is returned, so that
errno will be the only source of truth on the type of error that userapp
must check first. For now, we may need to stick with the current API.
--
Peter Xu
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2025-04-25 16:27 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-04-24 21:57 [PATCH 0/2] mm/userfaultfd: Fix uninitialized output field for -EAGAIN race Peter Xu
2025-04-24 21:57 ` [PATCH 1/2] " Peter Xu
2025-04-25 15:12 ` David Hildenbrand
2025-04-25 15:54 ` James Houghton
2025-04-25 16:27 ` Peter Xu
2025-04-24 21:57 ` [PATCH 2/2] mm/selftests: Add a test to verify mmap_changing race with -EAGAIN Peter Xu
2025-04-25 15:45 ` [PATCH 0/2] mm/userfaultfd: Fix uninitialized output field for -EAGAIN race James Houghton
2025-04-25 15:58 ` David Hildenbrand
2025-04-25 16:07 ` James Houghton
2025-04-25 16:18 ` Peter Xu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox