* [PATCH] mm/ksm: fix flag-dropping behavior in ksm_madvise
@ 2025-09-30 6:39 Jakub Acs
2025-09-30 6:45 ` David Hildenbrand
0 siblings, 1 reply; 3+ messages in thread
From: Jakub Acs @ 2025-09-30 6:39 UTC (permalink / raw)
To: linux-mm
Cc: acsjakub, Andrew Morton, David Hildenbrand, Xu Xin,
Chengming Zhou, Peter Xu, Axel Rasmussen, Mike Kravetz,
linux-kernel, stable
syzkaller discovered the following crash: (kernel BUG)
[ 44.607039] ------------[ cut here ]------------
[ 44.607422] kernel BUG at mm/userfaultfd.c:2067!
[ 44.608148] Oops: invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
[ 44.608814] CPU: 1 UID: 0 PID: 2475 Comm: reproducer Not tainted 6.16.0-rc6 #1 PREEMPT(none)
[ 44.609635] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[ 44.610695] RIP: 0010:userfaultfd_release_all+0x3a8/0x460
<snip other registers, drop unreliable trace>
[ 44.617726] Call Trace:
[ 44.617926] <TASK>
[ 44.619284] userfaultfd_release+0xef/0x1b0
[ 44.620976] __fput+0x3f9/0xb60
[ 44.621240] fput_close_sync+0x110/0x210
[ 44.622222] __x64_sys_close+0x8f/0x120
[ 44.622530] do_syscall_64+0x5b/0x2f0
[ 44.622840] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 44.623244] RIP: 0033:0x7f365bb3f227
Kernel panics because it detects UFFD inconsistency during
userfaultfd_release_all(). Specifically, a VMA which has a valid pointer
to vma->vm_userfaultfd_ctx, but no UFFD flags in vma->vm_flags.
The inconsistency is caused in ksm_madvise(): when user calls madvise()
with MADV_UNMEARGEABLE on a VMA that is registered for UFFD in MINOR
mode, it accidentally clears all flags stored in the upper 32 bits of
vma->vm_flags.
Assuming x86_64 kernel build, unsigned long is 64-bit and unsigned int
and int are 32-bit wide. This setup causes the following mishap during
the &= ~VM_MERGEABLE assignment.
VM_MERGEABLE is a 32-bit constant of type unsigned int, 0x8000'0000.
After ~ is applied, it becomes 0x7fff'ffff unsigned int, which is then
promoted to unsigned long before the & operation. This promotion fills
upper 32 bits with leading 0s, as we're doing unsigned conversion (and
even for a signed conversion, this wouldn't help as the leading bit is
0). & operation thus ends up AND-ing vm_flags with 0x0000'0000'7fff'ffff
instead of intended 0xffff'ffff'7fff'ffff and hence accidentally clears
the upper 32-bits of its value.
Fix it by casting `VM_MERGEABLE` constant to unsigned long to preserve
the upper 32 bits, in case it's needed.
Note: other VM_* flags are not affected:
This only happens to the VM_MERGEABLE flag, as the other VM_* flags are
all constants of type int and after ~ operation, they end up with
leading 1 and are thus converted to unsigned long with leading 1s.
Note 2:
After commit 31defc3b01d9 ("userfaultfd: remove (VM_)BUG_ON()s"), this is
no longer a kernel BUG, but a WARNING at the same place:
[ 45.595973] WARNING: CPU: 1 PID: 2474 at mm/userfaultfd.c:2067
but the root-cause (flag-drop) remains the same.
Fixes: 7677f7fd8be76 ("userfaultfd: add minor fault registration mode")
Signed-off-by: Jakub Acs <acsjakub@amazon.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Xu Xin <xu.xin16@zte.com.cn>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Peter Xu <peterx@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org
---
I looked around the kernel and found one more flag that might be
causing similar issues: "IORESOURCE_BUSY" - as its inverted version is
bit-anded to unsigned long fields. However, it seems those fields don't
actually use any bits from upper 32-bits as flags (yet?).
I also considered changing the constant definition by adding ULL, but am
not sure where else that could blow up, plus it would likely call to
define all the related constants as ULL for consistency. If you'd prefer
that fix, let me know.
mm/ksm.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/ksm.c b/mm/ksm.c
index 160787bb121c..c24137a1eeb7 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -2871,7 +2871,7 @@ int ksm_madvise(struct vm_area_struct *vma, unsigned long start,
return err;
}
- *vm_flags &= ~VM_MERGEABLE;
+ *vm_flags &= ~((unsigned long) VM_MERGEABLE);
break;
}
--
2.47.3
Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christian Schlaeger
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] mm/ksm: fix flag-dropping behavior in ksm_madvise
2025-09-30 6:39 [PATCH] mm/ksm: fix flag-dropping behavior in ksm_madvise Jakub Acs
@ 2025-09-30 6:45 ` David Hildenbrand
2025-09-30 13:03 ` Jakub Acs
0 siblings, 1 reply; 3+ messages in thread
From: David Hildenbrand @ 2025-09-30 6:45 UTC (permalink / raw)
To: Jakub Acs, linux-mm
Cc: Andrew Morton, Xu Xin, Chengming Zhou, Peter Xu, Axel Rasmussen,
Mike Kravetz, linux-kernel, stable
On 30.09.25 08:39, Jakub Acs wrote:
> syzkaller discovered the following crash: (kernel BUG)
>
> [ 44.607039] ------------[ cut here ]------------
> [ 44.607422] kernel BUG at mm/userfaultfd.c:2067!
> [ 44.608148] Oops: invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
> [ 44.608814] CPU: 1 UID: 0 PID: 2475 Comm: reproducer Not tainted 6.16.0-rc6 #1 PREEMPT(none)
> [ 44.609635] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
> [ 44.610695] RIP: 0010:userfaultfd_release_all+0x3a8/0x460
>
> <snip other registers, drop unreliable trace>
>
> [ 44.617726] Call Trace:
> [ 44.617926] <TASK>
> [ 44.619284] userfaultfd_release+0xef/0x1b0
> [ 44.620976] __fput+0x3f9/0xb60
> [ 44.621240] fput_close_sync+0x110/0x210
> [ 44.622222] __x64_sys_close+0x8f/0x120
> [ 44.622530] do_syscall_64+0x5b/0x2f0
> [ 44.622840] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [ 44.623244] RIP: 0033:0x7f365bb3f227
>
> Kernel panics because it detects UFFD inconsistency during
> userfaultfd_release_all(). Specifically, a VMA which has a valid pointer
> to vma->vm_userfaultfd_ctx, but no UFFD flags in vma->vm_flags.
>
> The inconsistency is caused in ksm_madvise(): when user calls madvise()
> with MADV_UNMEARGEABLE on a VMA that is registered for UFFD in MINOR
> mode, it accidentally clears all flags stored in the upper 32 bits of
> vma->vm_flags.
>
> Assuming x86_64 kernel build, unsigned long is 64-bit and unsigned int
> and int are 32-bit wide. This setup causes the following mishap during
> the &= ~VM_MERGEABLE assignment.
>
> VM_MERGEABLE is a 32-bit constant of type unsigned int, 0x8000'0000.
> After ~ is applied, it becomes 0x7fff'ffff unsigned int, which is then
> promoted to unsigned long before the & operation. This promotion fills
> upper 32 bits with leading 0s, as we're doing unsigned conversion (and
> even for a signed conversion, this wouldn't help as the leading bit is
> 0). & operation thus ends up AND-ing vm_flags with 0x0000'0000'7fff'ffff
> instead of intended 0xffff'ffff'7fff'ffff and hence accidentally clears
> the upper 32-bits of its value.
>
> Fix it by casting `VM_MERGEABLE` constant to unsigned long to preserve
> the upper 32 bits, in case it's needed.
>
> Note: other VM_* flags are not affected:
> This only happens to the VM_MERGEABLE flag, as the other VM_* flags are
> all constants of type int and after ~ operation, they end up with
> leading 1 and are thus converted to unsigned long with leading 1s.
>
> Note 2:
> After commit 31defc3b01d9 ("userfaultfd: remove (VM_)BUG_ON()s"), this is
> no longer a kernel BUG, but a WARNING at the same place:
>
> [ 45.595973] WARNING: CPU: 1 PID: 2474 at mm/userfaultfd.c:2067
>
> but the root-cause (flag-drop) remains the same.
>
> Fixes: 7677f7fd8be76 ("userfaultfd: add minor fault registration mode")
> Signed-off-by: Jakub Acs <acsjakub@amazon.de>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Xu Xin <xu.xin16@zte.com.cn>
> Cc: Chengming Zhou <chengming.zhou@linux.dev>
> Cc: Peter Xu <peterx@redhat.com>
> Cc: Axel Rasmussen <axelrasmussen@google.com>
> Cc: Mike Kravetz <mike.kravetz@oracle.com>
> Cc: linux-mm@kvack.org
> Cc: linux-kernel@vger.kernel.org
> Cc: stable@vger.kernel.org
> ---
>
> I looked around the kernel and found one more flag that might be
> causing similar issues: "IORESOURCE_BUSY" - as its inverted version is
> bit-anded to unsigned long fields. However, it seems those fields don't
> actually use any bits from upper 32-bits as flags (yet?).
>
> I also considered changing the constant definition by adding ULL, but am
> not sure where else that could blow up, plus it would likely call to
> define all the related constants as ULL for consistency. If you'd prefer
> that fix, let me know.
>
>
> mm/ksm.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/ksm.c b/mm/ksm.c
> index 160787bb121c..c24137a1eeb7 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -2871,7 +2871,7 @@ int ksm_madvise(struct vm_area_struct *vma, unsigned long start,
> return err;
> }
>
> - *vm_flags &= ~VM_MERGEABLE;
> + *vm_flags &= ~((unsigned long) VM_MERGEABLE);
> break;
> }
>
Wouldn't it be better to just do
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 1ae97a0b8ec75..0eaf8af153f98 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -296,7 +296,7 @@ extern unsigned int kobjsize(const void *objp);
#define VM_MIXEDMAP 0x10000000 /* Can contain "struct page" and pure PFN pages */
#define VM_HUGEPAGE 0x20000000 /* MADV_HUGEPAGE marked this vma */
#define VM_NOHUGEPAGE 0x40000000 /* MADV_NOHUGEPAGE marked this vma */
-#define VM_MERGEABLE 0x80000000 /* KSM may merge identical pages */
+#define VM_MERGEABLE 0x80000000ul /* KSM may merge identical pages */
#ifdef CONFIG_ARCH_USES_HIGH_VMA_FLAGS
#define VM_HIGH_ARCH_BIT_0 32 /* bit only usable on 64-bit architectures */
And for consistency doing it to all other flags as well? After all we have
typedef unsigned long vm_flags_t;
--
Cheers
David / dhildenb
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] mm/ksm: fix flag-dropping behavior in ksm_madvise
2025-09-30 6:45 ` David Hildenbrand
@ 2025-09-30 13:03 ` Jakub Acs
0 siblings, 0 replies; 3+ messages in thread
From: Jakub Acs @ 2025-09-30 13:03 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-mm, Andrew Morton, Xu Xin, Chengming Zhou, Peter Xu,
Axel Rasmussen, Mike Kravetz, linux-kernel, stable
On Tue, Sep 30, 2025 at 08:45:48AM +0200, David Hildenbrand wrote:
> On 30.09.25 08:39, Jakub Acs wrote:
> >syzkaller discovered the following crash: (kernel BUG)
> >
> >[ 44.607039] ------------[ cut here ]------------
> >[ 44.607422] kernel BUG at mm/userfaultfd.c:2067!
> >[ 44.608148] Oops: invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
> >[ 44.608814] CPU: 1 UID: 0 PID: 2475 Comm: reproducer Not tainted 6.16.0-rc6 #1 PREEMPT(none)
> >[ 44.609635] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
> >[ 44.610695] RIP: 0010:userfaultfd_release_all+0x3a8/0x460
> >
> ><snip other registers, drop unreliable trace>
> >
> >[ 44.617726] Call Trace:
> >[ 44.617926] <TASK>
> >[ 44.619284] userfaultfd_release+0xef/0x1b0
> >[ 44.620976] __fput+0x3f9/0xb60
> >[ 44.621240] fput_close_sync+0x110/0x210
> >[ 44.622222] __x64_sys_close+0x8f/0x120
> >[ 44.622530] do_syscall_64+0x5b/0x2f0
> >[ 44.622840] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> >[ 44.623244] RIP: 0033:0x7f365bb3f227
> >
> >Kernel panics because it detects UFFD inconsistency during
> >userfaultfd_release_all(). Specifically, a VMA which has a valid pointer
> >to vma->vm_userfaultfd_ctx, but no UFFD flags in vma->vm_flags.
> >
> >The inconsistency is caused in ksm_madvise(): when user calls madvise()
> >with MADV_UNMEARGEABLE on a VMA that is registered for UFFD in MINOR
> >mode, it accidentally clears all flags stored in the upper 32 bits of
> >vma->vm_flags.
> >
> >Assuming x86_64 kernel build, unsigned long is 64-bit and unsigned int
> >and int are 32-bit wide. This setup causes the following mishap during
> >the &= ~VM_MERGEABLE assignment.
> >
> >VM_MERGEABLE is a 32-bit constant of type unsigned int, 0x8000'0000.
> >After ~ is applied, it becomes 0x7fff'ffff unsigned int, which is then
> >promoted to unsigned long before the & operation. This promotion fills
> >upper 32 bits with leading 0s, as we're doing unsigned conversion (and
> >even for a signed conversion, this wouldn't help as the leading bit is
> >0). & operation thus ends up AND-ing vm_flags with 0x0000'0000'7fff'ffff
> >instead of intended 0xffff'ffff'7fff'ffff and hence accidentally clears
> >the upper 32-bits of its value.
> >
> >Fix it by casting `VM_MERGEABLE` constant to unsigned long to preserve
> >the upper 32 bits, in case it's needed.
> >
> >Note: other VM_* flags are not affected:
> >This only happens to the VM_MERGEABLE flag, as the other VM_* flags are
> >all constants of type int and after ~ operation, they end up with
> >leading 1 and are thus converted to unsigned long with leading 1s.
> >
> >Note 2:
> >After commit 31defc3b01d9 ("userfaultfd: remove (VM_)BUG_ON()s"), this is
> >no longer a kernel BUG, but a WARNING at the same place:
> >
> >[ 45.595973] WARNING: CPU: 1 PID: 2474 at mm/userfaultfd.c:2067
> >
> >but the root-cause (flag-drop) remains the same.
> >
> >Fixes: 7677f7fd8be76 ("userfaultfd: add minor fault registration mode")
> >Signed-off-by: Jakub Acs <acsjakub@amazon.de>
> >Cc: Andrew Morton <akpm@linux-foundation.org>
> >Cc: David Hildenbrand <david@redhat.com>
> >Cc: Xu Xin <xu.xin16@zte.com.cn>
> >Cc: Chengming Zhou <chengming.zhou@linux.dev>
> >Cc: Peter Xu <peterx@redhat.com>
> >Cc: Axel Rasmussen <axelrasmussen@google.com>
> >Cc: Mike Kravetz <mike.kravetz@oracle.com>
> >Cc: linux-mm@kvack.org
> >Cc: linux-kernel@vger.kernel.org
> >Cc: stable@vger.kernel.org
> >---
> >
> >I looked around the kernel and found one more flag that might be
> >causing similar issues: "IORESOURCE_BUSY" - as its inverted version is
> >bit-anded to unsigned long fields. However, it seems those fields don't
> >actually use any bits from upper 32-bits as flags (yet?).
> >
> >I also considered changing the constant definition by adding ULL, but am
> >not sure where else that could blow up, plus it would likely call to
> >define all the related constants as ULL for consistency. If you'd prefer
> >that fix, let me know.
> >
> >
> > mm/ksm.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> >diff --git a/mm/ksm.c b/mm/ksm.c
> >index 160787bb121c..c24137a1eeb7 100644
> >--- a/mm/ksm.c
> >+++ b/mm/ksm.c
> >@@ -2871,7 +2871,7 @@ int ksm_madvise(struct vm_area_struct *vma, unsigned long start,
> > return err;
> > }
> >- *vm_flags &= ~VM_MERGEABLE;
> >+ *vm_flags &= ~((unsigned long) VM_MERGEABLE);
> > break;
> > }
>
> Wouldn't it be better to just do
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 1ae97a0b8ec75..0eaf8af153f98 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -296,7 +296,7 @@ extern unsigned int kobjsize(const void *objp);
> #define VM_MIXEDMAP 0x10000000 /* Can contain "struct page" and pure PFN pages */
> #define VM_HUGEPAGE 0x20000000 /* MADV_HUGEPAGE marked this vma */
> #define VM_NOHUGEPAGE 0x40000000 /* MADV_NOHUGEPAGE marked this vma */
> -#define VM_MERGEABLE 0x80000000 /* KSM may merge identical pages */
> +#define VM_MERGEABLE 0x80000000ul /* KSM may merge identical pages */
> #ifdef CONFIG_ARCH_USES_HIGH_VMA_FLAGS
> #define VM_HIGH_ARCH_BIT_0 32 /* bit only usable on 64-bit architectures */
>
>
> And for consistency doing it to all other flags as well? After all we have
>
> typedef unsigned long vm_flags_t;
>
Makes sense, sent v2:
https://lore.kernel.org/all/20250930130023.60106-1-acsjakub@amazon.de/
Thank you,
Jakub
Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christian Schlaeger
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2025-09-30 13:04 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-09-30 6:39 [PATCH] mm/ksm: fix flag-dropping behavior in ksm_madvise Jakub Acs
2025-09-30 6:45 ` David Hildenbrand
2025-09-30 13:03 ` Jakub Acs
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox