linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH] userfaultfd: allow registration of ranges below mmap_min_addr
@ 2026-04-07  8:14 Denis M. Karpov
  2026-04-08  3:21 ` Harry Yoo (Oracle)
  2026-04-08 12:36 ` Usama Arif
  0 siblings, 2 replies; 8+ messages in thread
From: Denis M. Karpov @ 2026-04-07  8:14 UTC (permalink / raw)
  To: rppt, akpm, Liam.Howlett, ljs
  Cc: vbabka, jannh, peterx, pfalcato, brauner, viro, jack, linux-mm,
	linux-fsdevel, linux-kernel, Denis M . Karpov

The current implementation of validate_range() in fs/userfaultfd.c
performs a hard check against mmap_min_addr without considering
capabilities, but the mmap() syscall uses security_mmap_addr()
which allows privileged processes (with CAP_SYS_RAWIO) to map below
mmap_min_addr. Furthermore, security_mmap_addr()->cap_mmap_addr() uses
dac_mmap_min_addr variable which can be changed with
/proc/sys/vm/mmap_min_addr.

Because userfaultfd uses a different check, UFFDIO_REGISTER may fail
with -EINVAL for valid memory areas that were successfully mapped
below mmap_min_addr even with appropriate capabilities.

This prevents apps like binary compilers from using UFFD for valid memory
regions mapped by application.

Replace the rigid mmap_min_addr check with security_mmap_addr() to align
userfaultfd with the standard kernel memory mapping security policy.

Signed-off-by: Denis M. Karpov <komlomal@gmail.com>

---
Initial RFC following the discussion on the [BUG] thread.
Link: https://lore.kernel.org/all/CADtiZd0tWysx5HMCUnOXfSHB7PXAuXg1Mh4eY_hUmH29S=sejg@mail.gmail.com/
---
 fs/userfaultfd.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index bdc84e521..dbfe5b2a0 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -1238,15 +1238,13 @@ static __always_inline int validate_unaligned_range(
 		return -EINVAL;
 	if (!len)
 		return -EINVAL;
-	if (start < mmap_min_addr)
-		return -EINVAL;
 	if (start >= task_size)
 		return -EINVAL;
 	if (len > task_size - start)
 		return -EINVAL;
 	if (start + len <= start)
 		return -EINVAL;
-	return 0;
+	return security_mmap_addr(start);
 }
 
 static __always_inline int validate_range(struct mm_struct *mm,
-- 
2.47.3



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH] userfaultfd: allow registration of ranges below mmap_min_addr
  2026-04-07  8:14 [RFC PATCH] userfaultfd: allow registration of ranges below mmap_min_addr Denis M. Karpov
@ 2026-04-08  3:21 ` Harry Yoo (Oracle)
  2026-04-08  8:09   ` Denis M. Karpov
  2026-04-08 12:36 ` Usama Arif
  1 sibling, 1 reply; 8+ messages in thread
From: Harry Yoo (Oracle) @ 2026-04-08  3:21 UTC (permalink / raw)
  To: Denis M. Karpov
  Cc: rppt, akpm, Liam.Howlett, ljs, vbabka, jannh, peterx, pfalcato,
	brauner, viro, jack, linux-mm, linux-fsdevel, linux-kernel

On Tue, Apr 07, 2026 at 11:14:42AM +0300, Denis M. Karpov wrote:
> The current implementation of validate_range() in fs/userfaultfd.c
> performs a hard check against mmap_min_addr without considering
> capabilities, but the mmap() syscall uses security_mmap_addr()
> which allows privileged processes (with CAP_SYS_RAWIO) to map below
> mmap_min_addr. Furthermore, security_mmap_addr()->cap_mmap_addr() uses
> dac_mmap_min_addr variable which can be changed with
> /proc/sys/vm/mmap_min_addr.
> 
> Because userfaultfd uses a different check, UFFDIO_REGISTER may fail
> with -EINVAL for valid memory areas that were successfully mapped
> below mmap_min_addr even with appropriate capabilities.
> 
> This prevents apps like binary compilers from using UFFD for valid memory
> regions mapped by application.
> 
> Replace the rigid mmap_min_addr check with security_mmap_addr() to align
> userfaultfd with the standard kernel memory mapping security policy.

Perhaps worth adding

Fixes: 86039bd3b4e6 ("userfaultfd: add new syscall to provide memory externalization")

> Signed-off-by: Denis M. Karpov <komlomal@gmail.com>
> 
> ---
>  fs/userfaultfd.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> index bdc84e521..dbfe5b2a0 100644
> --- a/fs/userfaultfd.c
> +++ b/fs/userfaultfd.c
> @@ -1238,15 +1238,13 @@ static __always_inline int validate_unaligned_range(
>  		return -EINVAL;
>  	if (!len)
>  		return -EINVAL;
> -	if (start < mmap_min_addr)
> -		return -EINVAL;
>  	if (start >= task_size)
>  		return -EINVAL;
>  	if (len > task_size - start)
>  		return -EINVAL;
>  	if (start + len <= start)
>  		return -EINVAL;
> -	return 0;
> +	return security_mmap_addr(start);

Hmm but it looks bit strange to check capability for address that is
already mapped by mmap(). Why is this required?

>  }

-- 
Cheers,
Harry / Hyeonggon


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH] userfaultfd: allow registration of ranges below mmap_min_addr
  2026-04-08  3:21 ` Harry Yoo (Oracle)
@ 2026-04-08  8:09   ` Denis M. Karpov
  2026-04-09  2:51     ` Harry Yoo (Oracle)
  0 siblings, 1 reply; 8+ messages in thread
From: Denis M. Karpov @ 2026-04-08  8:09 UTC (permalink / raw)
  To: Harry Yoo (Oracle), Andrea Arcangeli
  Cc: rppt, akpm, Liam.Howlett, ljs, vbabka, jannh, peterx, pfalcato,
	brauner, viro, jack, linux-mm, linux-fsdevel, linux-kernel

> Fixes: 86039bd3b4e6 ("userfaultfd: add new syscall to provide memory externalization")
Thank you, I will add this Fixes tag in the next patch.

> Hmm but it looks bit strange to check capability for address that is
> already mapped by mmap(). Why is this required?
Actually, it's not obvious to me either, but I may miss something. My intent was
to replace the current restrictive check with a more flexible one. I think
performing this check here allows us to deny invalid requests early,
before locks or VMA lookups occur.

Removing this check entirely would also allow using UFFD in cases where a task
drops privileges after the initial mmap(). This seems reasonable because the
VMA already exists, i.e. kernel already allowed this mapping.

In the [BUG] thread discussion Andrea Arcangeli also suggested adding
a check for
FIRST_USER_ADDRESS to handle architectural constraints.

Andrea, could you please comment on this? Specifically, would a
check against FIRST_USER_ADDRESS sufficient here, or do we still
need to check caps?



On Wed, Apr 8, 2026 at 6:21 AM Harry Yoo (Oracle) <harry@kernel.org> wrote:
>
> On Tue, Apr 07, 2026 at 11:14:42AM +0300, Denis M. Karpov wrote:
> > The current implementation of validate_range() in fs/userfaultfd.c
> > performs a hard check against mmap_min_addr without considering
> > capabilities, but the mmap() syscall uses security_mmap_addr()
> > which allows privileged processes (with CAP_SYS_RAWIO) to map below
> > mmap_min_addr. Furthermore, security_mmap_addr()->cap_mmap_addr() uses
> > dac_mmap_min_addr variable which can be changed with
> > /proc/sys/vm/mmap_min_addr.
> >
> > Because userfaultfd uses a different check, UFFDIO_REGISTER may fail
> > with -EINVAL for valid memory areas that were successfully mapped
> > below mmap_min_addr even with appropriate capabilities.
> >
> > This prevents apps like binary compilers from using UFFD for valid memory
> > regions mapped by application.
> >
> > Replace the rigid mmap_min_addr check with security_mmap_addr() to align
> > userfaultfd with the standard kernel memory mapping security policy.
>
> Perhaps worth adding
>
> Fixes: 86039bd3b4e6 ("userfaultfd: add new syscall to provide memory externalization")
>
> > Signed-off-by: Denis M. Karpov <komlomal@gmail.com>
> >
> > ---
> >  fs/userfaultfd.c | 4 +---
> >  1 file changed, 1 insertion(+), 3 deletions(-)
> >
> > diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> > index bdc84e521..dbfe5b2a0 100644
> > --- a/fs/userfaultfd.c
> > +++ b/fs/userfaultfd.c
> > @@ -1238,15 +1238,13 @@ static __always_inline int validate_unaligned_range(
> >               return -EINVAL;
> >       if (!len)
> >               return -EINVAL;
> > -     if (start < mmap_min_addr)
> > -             return -EINVAL;
> >       if (start >= task_size)
> >               return -EINVAL;
> >       if (len > task_size - start)
> >               return -EINVAL;
> >       if (start + len <= start)
> >               return -EINVAL;
> > -     return 0;
> > +     return security_mmap_addr(start);
>
> Hmm but it looks bit strange to check capability for address that is
> already mapped by mmap(). Why is this required?
>
> >  }
>
> --
> Cheers,
> Harry / Hyeonggon


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH] userfaultfd: allow registration of ranges below mmap_min_addr
  2026-04-07  8:14 [RFC PATCH] userfaultfd: allow registration of ranges below mmap_min_addr Denis M. Karpov
  2026-04-08  3:21 ` Harry Yoo (Oracle)
@ 2026-04-08 12:36 ` Usama Arif
  2026-04-09  8:01   ` Lorenzo Stoakes
  1 sibling, 1 reply; 8+ messages in thread
From: Usama Arif @ 2026-04-08 12:36 UTC (permalink / raw)
  To: Denis M. Karpov
  Cc: Usama Arif, rppt, akpm, Liam.Howlett, ljs, vbabka, jannh, peterx,
	pfalcato, brauner, viro, jack, linux-mm, linux-fsdevel,
	linux-kernel

On Tue,  7 Apr 2026 11:14:42 +0300 "Denis M. Karpov" <komlomal@gmail.com> wrote:

> The current implementation of validate_range() in fs/userfaultfd.c
> performs a hard check against mmap_min_addr without considering
> capabilities, but the mmap() syscall uses security_mmap_addr()
> which allows privileged processes (with CAP_SYS_RAWIO) to map below
> mmap_min_addr. Furthermore, security_mmap_addr()->cap_mmap_addr() uses
> dac_mmap_min_addr variable which can be changed with
> /proc/sys/vm/mmap_min_addr.
> 
> Because userfaultfd uses a different check, UFFDIO_REGISTER may fail
> with -EINVAL for valid memory areas that were successfully mapped
> below mmap_min_addr even with appropriate capabilities.
> 
> This prevents apps like binary compilers from using UFFD for valid memory
> regions mapped by application.
> 
> Replace the rigid mmap_min_addr check with security_mmap_addr() to align
> userfaultfd with the standard kernel memory mapping security policy.
> 
> Signed-off-by: Denis M. Karpov <komlomal@gmail.com>
> 
> ---
> Initial RFC following the discussion on the [BUG] thread.
> Link: https://lore.kernel.org/all/CADtiZd0tWysx5HMCUnOXfSHB7PXAuXg1Mh4eY_hUmH29S=sejg@mail.gmail.com/
> ---
>  fs/userfaultfd.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> index bdc84e521..dbfe5b2a0 100644
> --- a/fs/userfaultfd.c
> +++ b/fs/userfaultfd.c
> @@ -1238,15 +1238,13 @@ static __always_inline int validate_unaligned_range(
>  		return -EINVAL;
>  	if (!len)
>  		return -EINVAL;
> -	if (start < mmap_min_addr)
> -		return -EINVAL;
>  	if (start >= task_size)
>  		return -EINVAL;
>  	if (len > task_size - start)
>  		return -EINVAL;
>  	if (start + len <= start)
>  		return -EINVAL;
> -	return 0;
> +	return security_mmap_addr(start);

Is this introducing an ABI change?

The old code returned -EINVAL when start was below mmap_min_addr.
The new code calls security_mmap_addr() which returns -EPERM when
the caller lacks CAP_SYS_RAWIO. Existing userspace callers checking
specifically for -EINVAL would see different behavior start is
below mmap_min_addr.

>  }
>  
>  static __always_inline int validate_range(struct mm_struct *mm,
> -- 
> 2.47.3
> 
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH] userfaultfd: allow registration of ranges below mmap_min_addr
  2026-04-08  8:09   ` Denis M. Karpov
@ 2026-04-09  2:51     ` Harry Yoo (Oracle)
  2026-04-09  7:58       ` Lorenzo Stoakes
  0 siblings, 1 reply; 8+ messages in thread
From: Harry Yoo (Oracle) @ 2026-04-09  2:51 UTC (permalink / raw)
  To: Denis M. Karpov
  Cc: Andrea Arcangeli, rppt, akpm, Liam.Howlett, ljs, vbabka, jannh,
	peterx, pfalcato, brauner, viro, jack, linux-mm, linux-fsdevel,
	linux-kernel

On Wed, Apr 08, 2026 at 11:09:00AM +0300, Denis M. Karpov wrote:
> > Hmm but it looks bit strange to check capability for address that is
> > already mapped by mmap(). Why is this required?
>
> Actually, it's not obvious to me either, but I may miss something.
> My intent was to replace the current restrictive check with a more flexible one.

Technically, it's less restrictive only if start < mmap_min_addr
(setting aside the discussion of whether this is an appropriate check).

Otherwise (start >= mmap_min_addr) it's more restrictive? (now, the process
should have the capability when registering an existing VMA to userfaultfd)

> I think performing this check here allows us to deny invalid requests early,
> before locks or VMA lookups occur.

But we're not trying to optimize it and we shouldn't add checks without
a proper explanation for the sake of optimization.

> Removing this check entirely would also allow using UFFD in cases where a task
> drops privileges after the initial mmap(). This seems reasonable because the
> VMA already exists, i.e. kernel already allowed this mapping.

Yeah, that seems reasonable to me.

IOW, I don't think "creating a VMA on a specific address (w/ proper
capabilities) is okay but once it is registered to userfaultfd,
it becomes a security hole" is a valid argument.

And we don't unmap those mappings when the process loses the capability
to map them anyway.

> In the [BUG] thread discussion

Was it a private discussion? I can't find Andrea's emails on the thread.

> Andrea Arcangeli also suggested adding a check for
> FIRST_USER_ADDRESS to handle architectural constraints.

Again, what's the point of checking this on the VMA that is already created?
*checks why FIRST_USER_ADDRESS was introduced*

commit e2cdef8c847b480529b7e26991926aab4be008e6
Author: Hugh Dickins <hugh@veritas.com>
Date:   Tue Apr 19 13:29:19 2005 -0700

    [PATCH] freepgt: free_pgtables from FIRST_USER_ADDRESS

    The patches to free_pgtables by vma left problems on any architectures which
    leave some user address page table entries unencapsulated by vma.  Andi has
    fixed the 32-bit vDSO on x86_64 to use a vma.  Now fix arm (and arm26), whose
    first PAGE_SIZE is reserved (perhaps) for machine vectors.

    Our calls to free_pgtables must not touch that area, and exit_mmap's
    BUG_ON(nr_ptes) must allow that arm's get_pgd_slow may (or may not) have
    allocated an extra page table, which its free_pgd_slow would free later.

    FIRST_USER_PGD_NR has misled me and others: until all the arches define
    FIRST_USER_ADDRESS instead, a hack in mmap.c to derive one from t'other.  This
    patch fixes the bugs, the remaining patches just clean it up.

    Signed-off-by: Hugh Dickins <hugh@veritas.com>
    Signed-off-by: Andrew Morton <akpm@osdl.org>
    Signed-off-by: Linus Torvalds <torvalds@osdl.org>

Oh, ok. there might be a raw mapping without VMA below FIRST_USER_ADDRESS.

Adding such a check wouldn't hurt... but if there is no VMA, you can't
register the range to userfaultfd anyway?

> Andrea, could you please comment on this? Specifically, would a
> check against FIRST_USER_ADDRESS sufficient here, or do we still
> need to check caps?
> 
> On Wed, Apr 8, 2026 at 6:21 AM Harry Yoo (Oracle) <harry@kernel.org> wrote:
> >
> > On Tue, Apr 07, 2026 at 11:14:42AM +0300, Denis M. Karpov wrote:
> > > The current implementation of validate_range() in fs/userfaultfd.c
> > > performs a hard check against mmap_min_addr without considering
> > > capabilities, but the mmap() syscall uses security_mmap_addr()
> > > which allows privileged processes (with CAP_SYS_RAWIO) to map below
> > > mmap_min_addr. Furthermore, security_mmap_addr()->cap_mmap_addr() uses
> > > dac_mmap_min_addr variable which can be changed with
> > > /proc/sys/vm/mmap_min_addr.
> > >
> > > Because userfaultfd uses a different check, UFFDIO_REGISTER may fail
> > > with -EINVAL for valid memory areas that were successfully mapped
> > > below mmap_min_addr even with appropriate capabilities.
> > >
> > > This prevents apps like binary compilers from using UFFD for valid memory
> > > regions mapped by application.
> > >
> > > Replace the rigid mmap_min_addr check with security_mmap_addr() to align
> > > userfaultfd with the standard kernel memory mapping security policy.
> >
> > Perhaps worth adding
> >
> > Fixes: 86039bd3b4e6 ("userfaultfd: add new syscall to provide memory externalization")
> >
> > > Signed-off-by: Denis M. Karpov <komlomal@gmail.com>
> > >
> > > ---
> > >  fs/userfaultfd.c | 4 +---
> > >  1 file changed, 1 insertion(+), 3 deletions(-)
> > >
> > > diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> > > index bdc84e521..dbfe5b2a0 100644
> > > --- a/fs/userfaultfd.c
> > > +++ b/fs/userfaultfd.c
> > > @@ -1238,15 +1238,13 @@ static __always_inline int validate_unaligned_range(
> > >               return -EINVAL;
> > >       if (!len)
> > >               return -EINVAL;
> > > -     if (start < mmap_min_addr)
> > > -             return -EINVAL;
> > >       if (start >= task_size)
> > >               return -EINVAL;
> > >       if (len > task_size - start)
> > >               return -EINVAL;
> > >       if (start + len <= start)
> > >               return -EINVAL;
> > > -     return 0;
> > > +     return security_mmap_addr(start);
> >
> > Hmm but it looks bit strange to check capability for address that is
> > already mapped by mmap(). Why is this required?

-- 
Cheers,
Harry / Hyeonggon


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH] userfaultfd: allow registration of ranges below mmap_min_addr
  2026-04-09  2:51     ` Harry Yoo (Oracle)
@ 2026-04-09  7:58       ` Lorenzo Stoakes
  0 siblings, 0 replies; 8+ messages in thread
From: Lorenzo Stoakes @ 2026-04-09  7:58 UTC (permalink / raw)
  To: Harry Yoo (Oracle)
  Cc: Denis M. Karpov, Andrea Arcangeli, rppt, akpm, Liam.Howlett,
	vbabka, jannh, peterx, pfalcato, brauner, viro, jack, linux-mm,
	linux-fsdevel, linux-kernel

On Thu, Apr 09, 2026 at 11:51:11AM +0900, Harry Yoo (Oracle) wrote:
> On Wed, Apr 08, 2026 at 11:09:00AM +0300, Denis M. Karpov wrote:
> > > Hmm but it looks bit strange to check capability for address that is
> > > already mapped by mmap(). Why is this required?
> >
> > Actually, it's not obvious to me either, but I may miss something.
> > My intent was to replace the current restrictive check with a more flexible one.
>
> Technically, it's less restrictive only if start < mmap_min_addr
> (setting aside the discussion of whether this is an appropriate check).
>
> Otherwise (start >= mmap_min_addr) it's more restrictive? (now, the process
> should have the capability when registering an existing VMA to userfaultfd)
>
> > I think performing this check here allows us to deny invalid requests early,
> > before locks or VMA lookups occur.
>
> But we're not trying to optimize it and we shouldn't add checks without
> a proper explanation for the sake of optimization.

Duplicating this kind of logic in the already horribly duplicative (and more
generally, horrible) UFFD implementation is actively buggy and incorrect IMO.

I also find it extremely odd that we are validating that a... source
address... is... mapped that way (in userfaultfd_copy(), we validate
uffdio_copy.src using validate_unaligned_range(), as well as the destination via
validate_range()).

It just makes no sense to me at all.

Let's get rid of it.

>
> > Removing this check entirely would also allow using UFFD in cases where a task
> > drops privileges after the initial mmap(). This seems reasonable because the
> > VMA already exists, i.e. kernel already allowed this mapping.
>
> Yeah, that seems reasonable to me.
>
> IOW, I don't think "creating a VMA on a specific address (w/ proper
> capabilities) is okay but once it is registered to userfaultfd,
> it becomes a security hole" is a valid argument.

Yes.

>
> And we don't unmap those mappings when the process loses the capability
> to map them anyway.

Once it's mapped it's mapped...?

>
> > In the [BUG] thread discussion
>
> Was it a private discussion? I can't find Andrea's emails on the thread.
>
> > Andrea Arcangeli also suggested adding a check for
> > FIRST_USER_ADDRESS to handle architectural constraints.
>
> Again, what's the point of checking this on the VMA that is already created?
> *checks why FIRST_USER_ADDRESS was introduced*

Yeah this is just the exact same thing with a different thing to compare
against no?

copy_from_user() will handle this in mfill_copy_folio_locked(), returning an
error if a user tried to copy from somewhere they shouldn't have (the same way
as if the user tried to copy from somewhere else they shouldn't have).

Let's not block on off-list sidebars.

>
> commit e2cdef8c847b480529b7e26991926aab4be008e6
> Author: Hugh Dickins <hugh@veritas.com>
> Date:   Tue Apr 19 13:29:19 2005 -0700
>
>     [PATCH] freepgt: free_pgtables from FIRST_USER_ADDRESS
>
>     The patches to free_pgtables by vma left problems on any architectures which
>     leave some user address page table entries unencapsulated by vma.  Andi has
>     fixed the 32-bit vDSO on x86_64 to use a vma.  Now fix arm (and arm26), whose
>     first PAGE_SIZE is reserved (perhaps) for machine vectors.
>
>     Our calls to free_pgtables must not touch that area, and exit_mmap's
>     BUG_ON(nr_ptes) must allow that arm's get_pgd_slow may (or may not) have
>     allocated an extra page table, which its free_pgd_slow would free later.
>
>     FIRST_USER_PGD_NR has misled me and others: until all the arches define
>     FIRST_USER_ADDRESS instead, a hack in mmap.c to derive one from t'other.  This
>     patch fixes the bugs, the remaining patches just clean it up.
>
>     Signed-off-by: Hugh Dickins <hugh@veritas.com>
>     Signed-off-by: Andrew Morton <akpm@osdl.org>
>     Signed-off-by: Linus Torvalds <torvalds@osdl.org>
>
> Oh, ok. there might be a raw mapping without VMA below FIRST_USER_ADDRESS.
>
> Adding such a check wouldn't hurt... but if there is no VMA, you can't
> register the range to userfaultfd anyway?

Exactly... and I don't want to see us randomly do checks that already happened
previously.

Putting duplicated bitrot-baiting code in what is one of the worst areas of
mm is not something I want us to do, and would like us to actively remove
anything that already exists like this.

And the fact that this is in an fs/ file is even more annoying to me. Really I
don't think _any_ meaningful uffd logic belongs there. Especially since we have
a bunch of other uffd crap in mm/userfaultfd.c.

The fs/userfaultfd.c file should be a bare-bones thing that handles the fs side
of uffd _only_.

Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH] userfaultfd: allow registration of ranges below mmap_min_addr
  2026-04-08 12:36 ` Usama Arif
@ 2026-04-09  8:01   ` Lorenzo Stoakes
  2026-04-09  9:05     ` Denis M. Karpov
  0 siblings, 1 reply; 8+ messages in thread
From: Lorenzo Stoakes @ 2026-04-09  8:01 UTC (permalink / raw)
  To: Usama Arif
  Cc: Denis M. Karpov, rppt, akpm, Liam.Howlett, vbabka, jannh, peterx,
	pfalcato, brauner, viro, jack, linux-mm, linux-fsdevel,
	linux-kernel

On Wed, Apr 08, 2026 at 05:36:59AM -0700, Usama Arif wrote:
> On Tue,  7 Apr 2026 11:14:42 +0300 "Denis M. Karpov" <komlomal@gmail.com> wrote:
>
> > The current implementation of validate_range() in fs/userfaultfd.c
> > performs a hard check against mmap_min_addr without considering
> > capabilities, but the mmap() syscall uses security_mmap_addr()
> > which allows privileged processes (with CAP_SYS_RAWIO) to map below
> > mmap_min_addr. Furthermore, security_mmap_addr()->cap_mmap_addr() uses
> > dac_mmap_min_addr variable which can be changed with
> > /proc/sys/vm/mmap_min_addr.
> >
> > Because userfaultfd uses a different check, UFFDIO_REGISTER may fail
> > with -EINVAL for valid memory areas that were successfully mapped
> > below mmap_min_addr even with appropriate capabilities.
> >
> > This prevents apps like binary compilers from using UFFD for valid memory
> > regions mapped by application.
> >
> > Replace the rigid mmap_min_addr check with security_mmap_addr() to align
> > userfaultfd with the standard kernel memory mapping security policy.
> >
> > Signed-off-by: Denis M. Karpov <komlomal@gmail.com>
> >
> > ---
> > Initial RFC following the discussion on the [BUG] thread.
> > Link: https://lore.kernel.org/all/CADtiZd0tWysx5HMCUnOXfSHB7PXAuXg1Mh4eY_hUmH29S=sejg@mail.gmail.com/
> > ---
> >  fs/userfaultfd.c | 4 +---
> >  1 file changed, 1 insertion(+), 3 deletions(-)
> >
> > diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> > index bdc84e521..dbfe5b2a0 100644
> > --- a/fs/userfaultfd.c
> > +++ b/fs/userfaultfd.c
> > @@ -1238,15 +1238,13 @@ static __always_inline int validate_unaligned_range(
> >  		return -EINVAL;
> >  	if (!len)
> >  		return -EINVAL;
> > -	if (start < mmap_min_addr)
> > -		return -EINVAL;
> >  	if (start >= task_size)
> >  		return -EINVAL;
> >  	if (len > task_size - start)
> >  		return -EINVAL;
> >  	if (start + len <= start)
> >  		return -EINVAL;
> > -	return 0;
> > +	return security_mmap_addr(start);
>
> Is this introducing an ABI change?
>
> The old code returned -EINVAL when start was below mmap_min_addr.
> The new code calls security_mmap_addr() which returns -EPERM when
> the caller lacks CAP_SYS_RAWIO. Existing userspace callers checking
> specifically for -EINVAL would see different behavior start is
> below mmap_min_addr.

You mean API change? :) we don't guarantee ABI for kernel stuff anyway.

Firstly, as with Harry, I don't believe we should be duplicating checks here
anyway. UFFD is duplicative enough as it is.

And this is such a silly edge case that I don't think it is valid or reasonable
for us to account for whichever totally insane user relies on a pointless
re-check being done there and _then_ relies on the error code
being... -EINVAL... which is overloaded for a million other possible failures.

Let's let it be -EFAULT and remove this silly check altogether.

>
> >  }
> >
> >  static __always_inline int validate_range(struct mm_struct *mm,
> > --
> > 2.47.3
> >
> >

Thanks, Lorenzo


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH] userfaultfd: allow registration of ranges below mmap_min_addr
  2026-04-09  8:01   ` Lorenzo Stoakes
@ 2026-04-09  9:05     ` Denis M. Karpov
  0 siblings, 0 replies; 8+ messages in thread
From: Denis M. Karpov @ 2026-04-09  9:05 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Usama Arif, rppt, akpm, Liam.Howlett, vbabka, jannh, peterx,
	pfalcato, brauner, viro, jack, linux-mm, linux-fsdevel,
	linux-kernel, Andrea Arcangeli

(to Harry)
> Technically, it's less restrictive only if start < mmap_min_addr
> (setting aside the discussion of whether this is an appropriate check).
>
> Otherwise (start >= mmap_min_addr) it's more restrictive? (now, the process
> should have the capability when registering an existing VMA to userfaultfd)
Hmm, I can't find any checks for addr >= mmap_min_addr in the security
subsystem,
only if addr < mmap_min_addr. Otherwise, one would need capabilities for
regular mmap() calls as well.

> Was it a private discussion? I can't find Andrea's emails on the thread.
Oh, it seems Andrea accidentally dropped some recipients from the CC list.
I have CC'd him here so he can clarify his points if he feels it is necessary.

(to Lorenzo)
>Duplicating this kind of logic in the already horribly duplicative (and more
>generally, horrible) UFFD implementation is actively buggy and incorrect IMO.
So, no security_mmap_addr check, no FIRST_USER_ADDRESS check.
Thank you both for the review. I'll prepare the patch.

On Thu, Apr 9, 2026 at 11:01 AM Lorenzo Stoakes <ljs@kernel.org> wrote:
>
> On Wed, Apr 08, 2026 at 05:36:59AM -0700, Usama Arif wrote:
> > On Tue,  7 Apr 2026 11:14:42 +0300 "Denis M. Karpov" <komlomal@gmail.com> wrote:
> >
> > > The current implementation of validate_range() in fs/userfaultfd.c
> > > performs a hard check against mmap_min_addr without considering
> > > capabilities, but the mmap() syscall uses security_mmap_addr()
> > > which allows privileged processes (with CAP_SYS_RAWIO) to map below
> > > mmap_min_addr. Furthermore, security_mmap_addr()->cap_mmap_addr() uses
> > > dac_mmap_min_addr variable which can be changed with
> > > /proc/sys/vm/mmap_min_addr.
> > >
> > > Because userfaultfd uses a different check, UFFDIO_REGISTER may fail
> > > with -EINVAL for valid memory areas that were successfully mapped
> > > below mmap_min_addr even with appropriate capabilities.
> > >
> > > This prevents apps like binary compilers from using UFFD for valid memory
> > > regions mapped by application.
> > >
> > > Replace the rigid mmap_min_addr check with security_mmap_addr() to align
> > > userfaultfd with the standard kernel memory mapping security policy.
> > >
> > > Signed-off-by: Denis M. Karpov <komlomal@gmail.com>
> > >
> > > ---
> > > Initial RFC following the discussion on the [BUG] thread.
> > > Link: https://lore.kernel.org/all/CADtiZd0tWysx5HMCUnOXfSHB7PXAuXg1Mh4eY_hUmH29S=sejg@mail.gmail.com/
> > > ---
> > >  fs/userfaultfd.c | 4 +---
> > >  1 file changed, 1 insertion(+), 3 deletions(-)
> > >
> > > diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> > > index bdc84e521..dbfe5b2a0 100644
> > > --- a/fs/userfaultfd.c
> > > +++ b/fs/userfaultfd.c
> > > @@ -1238,15 +1238,13 @@ static __always_inline int validate_unaligned_range(
> > >             return -EINVAL;
> > >     if (!len)
> > >             return -EINVAL;
> > > -   if (start < mmap_min_addr)
> > > -           return -EINVAL;
> > >     if (start >= task_size)
> > >             return -EINVAL;
> > >     if (len > task_size - start)
> > >             return -EINVAL;
> > >     if (start + len <= start)
> > >             return -EINVAL;
> > > -   return 0;
> > > +   return security_mmap_addr(start);
> >
> > Is this introducing an ABI change?
> >
> > The old code returned -EINVAL when start was below mmap_min_addr.
> > The new code calls security_mmap_addr() which returns -EPERM when
> > the caller lacks CAP_SYS_RAWIO. Existing userspace callers checking
> > specifically for -EINVAL would see different behavior start is
> > below mmap_min_addr.
>
> You mean API change? :) we don't guarantee ABI for kernel stuff anyway.
>
> Firstly, as with Harry, I don't believe we should be duplicating checks here
> anyway. UFFD is duplicative enough as it is.
>
> And this is such a silly edge case that I don't think it is valid or reasonable
> for us to account for whichever totally insane user relies on a pointless
> re-check being done there and _then_ relies on the error code
> being... -EINVAL... which is overloaded for a million other possible failures.
>
> Let's let it be -EFAULT and remove this silly check altogether.
>
> >
> > >  }
> > >
> > >  static __always_inline int validate_range(struct mm_struct *mm,
> > > --
> > > 2.47.3
> > >
> > >
>
> Thanks, Lorenzo


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2026-04-09  9:06 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-04-07  8:14 [RFC PATCH] userfaultfd: allow registration of ranges below mmap_min_addr Denis M. Karpov
2026-04-08  3:21 ` Harry Yoo (Oracle)
2026-04-08  8:09   ` Denis M. Karpov
2026-04-09  2:51     ` Harry Yoo (Oracle)
2026-04-09  7:58       ` Lorenzo Stoakes
2026-04-08 12:36 ` Usama Arif
2026-04-09  8:01   ` Lorenzo Stoakes
2026-04-09  9:05     ` Denis M. Karpov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox