linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/2] mm, ksm: fix flag-dropping behavior
@ 2025-10-01  9:03 Jakub Acs
  2025-10-01  9:03 ` [PATCH v3 1/2] mm/ksm: fix flag-dropping behavior in ksm_madvise Jakub Acs
  2025-10-01  9:03 ` [PATCH v3 2/2] mm: redefine VM_* flag constants with BIT() Jakub Acs
  0 siblings, 2 replies; 14+ messages in thread
From: Jakub Acs @ 2025-10-01  9:03 UTC (permalink / raw)
  To: linux-mm
  Cc: acsjakub, akpm, david, xu.xin16, chengming.zhou, peterx,
	axelrasmussen, linux-kernel

Hi,

This series fixes a bug in ksm_madvise described in [1/2].
As asked by David in the reply to v2, I separated the change into two
commits:
- first contains minimal fix for the bug to make it
  individually-backportable
- second makes the VM_* flag definitions consistent

v1: https://lore.kernel.org/all/20250930063921.62354-1-acsjakub@amazon.de/
v2: https://lore.kernel.org/all/20250930130023.60106-1-acsjakub@amazon.de/

Jakub Acs (2):
  mm/ksm: fix flag-dropping behavior in ksm_madvise
  mm: redefine VM_* flag constants with BIT()

 include/linux/mm.h | 68 +++++++++++++++++++++++-----------------------
 1 file changed, 34 insertions(+), 34 deletions(-)

-- 
2.47.3




Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christian Schlaeger
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v3 1/2] mm/ksm: fix flag-dropping behavior in ksm_madvise
  2025-10-01  9:03 [PATCH v3 0/2] mm, ksm: fix flag-dropping behavior Jakub Acs
@ 2025-10-01  9:03 ` Jakub Acs
  2025-10-01 14:06   ` David Hildenbrand
                     ` (2 more replies)
  2025-10-01  9:03 ` [PATCH v3 2/2] mm: redefine VM_* flag constants with BIT() Jakub Acs
  1 sibling, 3 replies; 14+ messages in thread
From: Jakub Acs @ 2025-10-01  9:03 UTC (permalink / raw)
  To: linux-mm
  Cc: acsjakub, akpm, david, xu.xin16, chengming.zhou, peterx,
	axelrasmussen, linux-kernel, stable

syzkaller discovered the following crash: (kernel BUG)

[   44.607039] ------------[ cut here ]------------
[   44.607422] kernel BUG at mm/userfaultfd.c:2067!
[   44.608148] Oops: invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
[   44.608814] CPU: 1 UID: 0 PID: 2475 Comm: reproducer Not tainted 6.16.0-rc6 #1 PREEMPT(none)
[   44.609635] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[   44.610695] RIP: 0010:userfaultfd_release_all+0x3a8/0x460

<snip other registers, drop unreliable trace>

[   44.617726] Call Trace:
[   44.617926]  <TASK>
[   44.619284]  userfaultfd_release+0xef/0x1b0
[   44.620976]  __fput+0x3f9/0xb60
[   44.621240]  fput_close_sync+0x110/0x210
[   44.622222]  __x64_sys_close+0x8f/0x120
[   44.622530]  do_syscall_64+0x5b/0x2f0
[   44.622840]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   44.623244] RIP: 0033:0x7f365bb3f227

Kernel panics because it detects UFFD inconsistency during
userfaultfd_release_all(). Specifically, a VMA which has a valid pointer
to vma->vm_userfaultfd_ctx, but no UFFD flags in vma->vm_flags.

The inconsistency is caused in ksm_madvise(): when user calls madvise()
with MADV_UNMEARGEABLE on a VMA that is registered for UFFD in MINOR
mode, it accidentally clears all flags stored in the upper 32 bits of
vma->vm_flags.

Assuming x86_64 kernel build, unsigned long is 64-bit and unsigned int
and int are 32-bit wide. This setup causes the following mishap during
the &= ~VM_MERGEABLE assignment.

VM_MERGEABLE is a 32-bit constant of type unsigned int, 0x8000'0000.
After ~ is applied, it becomes 0x7fff'ffff unsigned int, which is then
promoted to unsigned long before the & operation. This promotion fills
upper 32 bits with leading 0s, as we're doing unsigned conversion (and
even for a signed conversion, this wouldn't help as the leading bit is
0). & operation thus ends up AND-ing vm_flags with 0x0000'0000'7fff'ffff
instead of intended 0xffff'ffff'7fff'ffff and hence accidentally clears
the upper 32-bits of its value.

Fix it by changing `VM_MERGEABLE` constant to unsigned long, using the
BIT() macro.

Note: other VM_* flags are not affected:
This only happens to the VM_MERGEABLE flag, as the other VM_* flags are
all constants of type int and after ~ operation, they end up with
leading 1 and are thus converted to unsigned long with leading 1s.

Note 2:
After commit 31defc3b01d9 ("userfaultfd: remove (VM_)BUG_ON()s"), this is
no longer a kernel BUG, but a WARNING at the same place:

[   45.595973] WARNING: CPU: 1 PID: 2474 at mm/userfaultfd.c:2067

but the root-cause (flag-drop) remains the same.

Fixes: 7677f7fd8be76 ("userfaultfd: add minor fault registration mode")
Signed-off-by: Jakub Acs <acsjakub@amazon.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Xu Xin <xu.xin16@zte.com.cn>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Peter Xu <peterx@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org
---
 include/linux/mm.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 1ae97a0b8ec7..c6794d0e24eb 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -296,7 +296,7 @@ extern unsigned int kobjsize(const void *objp);
 #define VM_MIXEDMAP	0x10000000	/* Can contain "struct page" and pure PFN pages */
 #define VM_HUGEPAGE	0x20000000	/* MADV_HUGEPAGE marked this vma */
 #define VM_NOHUGEPAGE	0x40000000	/* MADV_NOHUGEPAGE marked this vma */
-#define VM_MERGEABLE	0x80000000	/* KSM may merge identical pages */
+#define VM_MERGEABLE	BIT(31)		/* KSM may merge identical pages */
 
 #ifdef CONFIG_ARCH_USES_HIGH_VMA_FLAGS
 #define VM_HIGH_ARCH_BIT_0	32	/* bit only usable on 64-bit architectures */
-- 
2.47.3




Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christian Schlaeger
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v3 2/2] mm: redefine VM_* flag constants with BIT()
  2025-10-01  9:03 [PATCH v3 0/2] mm, ksm: fix flag-dropping behavior Jakub Acs
  2025-10-01  9:03 ` [PATCH v3 1/2] mm/ksm: fix flag-dropping behavior in ksm_madvise Jakub Acs
@ 2025-10-01  9:03 ` Jakub Acs
  2025-10-01 14:04   ` David Hildenbrand
  2025-10-01 16:51   ` SeongJae Park
  1 sibling, 2 replies; 14+ messages in thread
From: Jakub Acs @ 2025-10-01  9:03 UTC (permalink / raw)
  To: linux-mm
  Cc: acsjakub, akpm, david, xu.xin16, chengming.zhou, peterx,
	axelrasmussen, linux-kernel

Make VM_* flag constant definitions consistent - unify all to use BIT()
macro.

This is a separete follow-up fix after we changed VM_MERGEABLE
separately to isolate bugfix for easier backporting. As suggested by
David in [1]. 

[1]: https://lore.kernel.org/all/85f852f9-8577-4230-adc7-c52e7f479454@redhat.com/

Signed-off-by: Jakub Acs <acsjakub@amazon.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Xu Xin <xu.xin16@zte.com.cn>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Peter Xu <peterx@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
---
 include/linux/mm.h | 66 +++++++++++++++++++++++-----------------------
 1 file changed, 33 insertions(+), 33 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index c6794d0e24eb..88cab3d7eea2 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -246,56 +246,56 @@ extern unsigned int kobjsize(const void *objp);
  * vm_flags in vm_area_struct, see mm_types.h.
  * When changing, update also include/trace/events/mmflags.h
  */
-#define VM_NONE		0x00000000
+#define VM_NONE		0
 
-#define VM_READ		0x00000001	/* currently active flags */
-#define VM_WRITE	0x00000002
-#define VM_EXEC		0x00000004
-#define VM_SHARED	0x00000008
+#define VM_READ		BIT(0)		/* currently active flags */
+#define VM_WRITE	BIT(1)
+#define VM_EXEC		BIT(2)
+#define VM_SHARED	BIT(3)
 
 /* mprotect() hardcodes VM_MAYREAD >> 4 == VM_READ, and so for r/w/x bits. */
-#define VM_MAYREAD	0x00000010	/* limits for mprotect() etc */
-#define VM_MAYWRITE	0x00000020
-#define VM_MAYEXEC	0x00000040
-#define VM_MAYSHARE	0x00000080
+#define VM_MAYREAD	BIT(4)		/* limits for mprotect() etc */
+#define VM_MAYWRITE	BIT(5)
+#define VM_MAYEXEC	BIT(6)
+#define VM_MAYSHARE	BIT(7)
 
-#define VM_GROWSDOWN	0x00000100	/* general info on the segment */
+#define VM_GROWSDOWN	BIT(8)		/* general info on the segment */
 #ifdef CONFIG_MMU
-#define VM_UFFD_MISSING	0x00000200	/* missing pages tracking */
+#define VM_UFFD_MISSING	BIT(9)		/* missing pages tracking */
 #else /* CONFIG_MMU */
-#define VM_MAYOVERLAY	0x00000200	/* nommu: R/O MAP_PRIVATE mapping that might overlay a file mapping */
+#define VM_MAYOVERLAY	BIT(10)		/* nommu: R/O MAP_PRIVATE mapping that might overlay a file mapping */
 #define VM_UFFD_MISSING	0
 #endif /* CONFIG_MMU */
-#define VM_PFNMAP	0x00000400	/* Page-ranges managed without "struct page", just pure PFN */
-#define VM_UFFD_WP	0x00001000	/* wrprotect pages tracking */
+#define VM_PFNMAP	BIT(11)		/* Page-ranges managed without "struct page", just pure PFN */
+#define VM_UFFD_WP	BIT(12)		/* wrprotect pages tracking */
 
-#define VM_LOCKED	0x00002000
-#define VM_IO           0x00004000	/* Memory mapped I/O or similar */
+#define VM_LOCKED	BIT(13)
+#define VM_IO           BIT(14)		/* Memory mapped I/O or similar */
 
 					/* Used by sys_madvise() */
-#define VM_SEQ_READ	0x00008000	/* App will access data sequentially */
-#define VM_RAND_READ	0x00010000	/* App will not benefit from clustered reads */
-
-#define VM_DONTCOPY	0x00020000      /* Do not copy this vma on fork */
-#define VM_DONTEXPAND	0x00040000	/* Cannot expand with mremap() */
-#define VM_LOCKONFAULT	0x00080000	/* Lock the pages covered when they are faulted in */
-#define VM_ACCOUNT	0x00100000	/* Is a VM accounted object */
-#define VM_NORESERVE	0x00200000	/* should the VM suppress accounting */
-#define VM_HUGETLB	0x00400000	/* Huge TLB Page VM */
-#define VM_SYNC		0x00800000	/* Synchronous page faults */
-#define VM_ARCH_1	0x01000000	/* Architecture-specific flag */
-#define VM_WIPEONFORK	0x02000000	/* Wipe VMA contents in child. */
-#define VM_DONTDUMP	0x04000000	/* Do not include in the core dump */
+#define VM_SEQ_READ	BIT(15)		/* App will access data sequentially */
+#define VM_RAND_READ	BIT(16)		/* App will not benefit from clustered reads */
+
+#define VM_DONTCOPY	BIT(17)		/* Do not copy this vma on fork */
+#define VM_DONTEXPAND	BIT(18)		/* Cannot expand with mremap() */
+#define VM_LOCKONFAULT	BIT(19)		/* Lock the pages covered when they are faulted in */
+#define VM_ACCOUNT	BIT(20)		/* Is a VM accounted object */
+#define VM_NORESERVE	BIT(21)		/* should the VM suppress accounting */
+#define VM_HUGETLB	BIT(22)		/* Huge TLB Page VM */
+#define VM_SYNC		BIT(23)		/* Synchronous page faults */
+#define VM_ARCH_1	BIT(24)		/* Architecture-specific flag */
+#define VM_WIPEONFORK	BIT(25)		/* Wipe VMA contents in child. */
+#define VM_DONTDUMP	BIT(26)		/* Do not include in the core dump */
 
 #ifdef CONFIG_MEM_SOFT_DIRTY
-# define VM_SOFTDIRTY	0x08000000	/* Not soft dirty clean area */
+# define VM_SOFTDIRTY	BIT(27)		/* Not soft dirty clean area */
 #else
 # define VM_SOFTDIRTY	0
 #endif
 
-#define VM_MIXEDMAP	0x10000000	/* Can contain "struct page" and pure PFN pages */
-#define VM_HUGEPAGE	0x20000000	/* MADV_HUGEPAGE marked this vma */
-#define VM_NOHUGEPAGE	0x40000000	/* MADV_NOHUGEPAGE marked this vma */
+#define VM_MIXEDMAP	BIT(28)		/* Can contain "struct page" and pure PFN pages */
+#define VM_HUGEPAGE	BIT(29)		/* MADV_HUGEPAGE marked this vma */
+#define VM_NOHUGEPAGE	BIT(30)		/* MADV_NOHUGEPAGE marked this vma */
 #define VM_MERGEABLE	BIT(31)		/* KSM may merge identical pages */
 
 #ifdef CONFIG_ARCH_USES_HIGH_VMA_FLAGS
-- 
2.47.3




Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christian Schlaeger
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 2/2] mm: redefine VM_* flag constants with BIT()
  2025-10-01  9:03 ` [PATCH v3 2/2] mm: redefine VM_* flag constants with BIT() Jakub Acs
@ 2025-10-01 14:04   ` David Hildenbrand
  2025-10-02  8:03     ` Jakub Acs
  2025-10-01 16:51   ` SeongJae Park
  1 sibling, 1 reply; 14+ messages in thread
From: David Hildenbrand @ 2025-10-01 14:04 UTC (permalink / raw)
  To: Jakub Acs, linux-mm
  Cc: akpm, xu.xin16, chengming.zhou, peterx, axelrasmussen, linux-kernel

>   
> -#define VM_GROWSDOWN	0x00000100	/* general info on the segment */
> +#define VM_GROWSDOWN	BIT(8)		/* general info on the segment */
>   #ifdef CONFIG_MMU
> -#define VM_UFFD_MISSING	0x00000200	/* missing pages tracking */
> +#define VM_UFFD_MISSING	BIT(9)		/* missing pages tracking */
>   #else /* CONFIG_MMU */
> -#define VM_MAYOVERLAY	0x00000200	/* nommu: R/O MAP_PRIVATE mapping that might overlay a file mapping */
> +#define VM_MAYOVERLAY	BIT(10)		/* nommu: R/O MAP_PRIVATE mapping that might overlay a file mapping */

Careful: VM_UFFD_MISSING and VM_MAYOVERLAY share the same bit, depending 
on CONFIG_MMU (9).

>   #define VM_UFFD_MISSING	0
>   #endif /* CONFIG_MMU */
> -#define VM_PFNMAP	0x00000400	/* Page-ranges managed without "struct page", just pure PFN */
> -#define VM_UFFD_WP	0x00001000	/* wrprotect pages tracking */
> +#define VM_PFNMAP	BIT(11)		/* Page-ranges managed without "struct page", just pure PFN */

-> 10

11 is actually unused IIUC.

> +#define VM_UFFD_WP	BIT(12)		/* wrprotect pages tracking */
>   

This seems to be correct again.


IIRC, Andrew prefers not mixing fixes and cleanups in the same series if 
possible. So you might just want to send out patch #1 separately and, 
send out patch #2 separately with a note under the --- that it depends 
on patch #1.

-- 
Cheers

David / dhildenb



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 1/2] mm/ksm: fix flag-dropping behavior in ksm_madvise
  2025-10-01  9:03 ` [PATCH v3 1/2] mm/ksm: fix flag-dropping behavior in ksm_madvise Jakub Acs
@ 2025-10-01 14:06   ` David Hildenbrand
  2025-10-01 16:43   ` SeongJae Park
  2025-11-06 10:39   ` Vlastimil Babka
  2 siblings, 0 replies; 14+ messages in thread
From: David Hildenbrand @ 2025-10-01 14:06 UTC (permalink / raw)
  To: Jakub Acs, linux-mm
  Cc: akpm, xu.xin16, chengming.zhou, peterx, axelrasmussen,
	linux-kernel, stable

On 01.10.25 11:03, Jakub Acs wrote:
> syzkaller discovered the following crash: (kernel BUG)
> 
> [   44.607039] ------------[ cut here ]------------
> [   44.607422] kernel BUG at mm/userfaultfd.c:2067!
> [   44.608148] Oops: invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
> [   44.608814] CPU: 1 UID: 0 PID: 2475 Comm: reproducer Not tainted 6.16.0-rc6 #1 PREEMPT(none)
> [   44.609635] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
> [   44.610695] RIP: 0010:userfaultfd_release_all+0x3a8/0x460
> 
> <snip other registers, drop unreliable trace>
> 
> [   44.617726] Call Trace:
> [   44.617926]  <TASK>
> [   44.619284]  userfaultfd_release+0xef/0x1b0
> [   44.620976]  __fput+0x3f9/0xb60
> [   44.621240]  fput_close_sync+0x110/0x210
> [   44.622222]  __x64_sys_close+0x8f/0x120
> [   44.622530]  do_syscall_64+0x5b/0x2f0
> [   44.622840]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [   44.623244] RIP: 0033:0x7f365bb3f227
> 
> Kernel panics because it detects UFFD inconsistency during
> userfaultfd_release_all(). Specifically, a VMA which has a valid pointer
> to vma->vm_userfaultfd_ctx, but no UFFD flags in vma->vm_flags.
> 
> The inconsistency is caused in ksm_madvise(): when user calls madvise()
> with MADV_UNMEARGEABLE on a VMA that is registered for UFFD in MINOR
> mode, it accidentally clears all flags stored in the upper 32 bits of
> vma->vm_flags.
> 
> Assuming x86_64 kernel build, unsigned long is 64-bit and unsigned int
> and int are 32-bit wide. This setup causes the following mishap during
> the &= ~VM_MERGEABLE assignment.
> 
> VM_MERGEABLE is a 32-bit constant of type unsigned int, 0x8000'0000.
> After ~ is applied, it becomes 0x7fff'ffff unsigned int, which is then
> promoted to unsigned long before the & operation. This promotion fills
> upper 32 bits with leading 0s, as we're doing unsigned conversion (and
> even for a signed conversion, this wouldn't help as the leading bit is
> 0). & operation thus ends up AND-ing vm_flags with 0x0000'0000'7fff'ffff
> instead of intended 0xffff'ffff'7fff'ffff and hence accidentally clears
> the upper 32-bits of its value.
> 
> Fix it by changing `VM_MERGEABLE` constant to unsigned long, using the
> BIT() macro.
> 
> Note: other VM_* flags are not affected:
> This only happens to the VM_MERGEABLE flag, as the other VM_* flags are
> all constants of type int and after ~ operation, they end up with
> leading 1 and are thus converted to unsigned long with leading 1s.
> 
> Note 2:
> After commit 31defc3b01d9 ("userfaultfd: remove (VM_)BUG_ON()s"), this is
> no longer a kernel BUG, but a WARNING at the same place:
> 
> [   45.595973] WARNING: CPU: 1 PID: 2474 at mm/userfaultfd.c:2067
> 
> but the root-cause (flag-drop) remains the same.
> 
> Fixes: 7677f7fd8be76 ("userfaultfd: add minor fault registration mode")

Very Likely we want to CC stable.

> Signed-off-by: Jakub Acs <acsjakub@amazon.de>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Xu Xin <xu.xin16@zte.com.cn>
> Cc: Chengming Zhou <chengming.zhou@linux.dev>
> Cc: Peter Xu <peterx@redhat.com>
> Cc: Axel Rasmussen <axelrasmussen@google.com>
> Cc: linux-mm@kvack.org
> Cc: linux-kernel@vger.kernel.org
> Cc: stable@vger.kernel.org
> ---

IMHO no need to resend this one if Andrew can just pick this one up. 
Then, you can send out patch #2 separately as commented in reply to 
patch #2.

Thanks!

Acked-by: David Hildenbrand <david@redhat.com>

-- 
Cheers

David / dhildenb



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 1/2] mm/ksm: fix flag-dropping behavior in ksm_madvise
  2025-10-01  9:03 ` [PATCH v3 1/2] mm/ksm: fix flag-dropping behavior in ksm_madvise Jakub Acs
  2025-10-01 14:06   ` David Hildenbrand
@ 2025-10-01 16:43   ` SeongJae Park
  2025-11-06 10:39   ` Vlastimil Babka
  2 siblings, 0 replies; 14+ messages in thread
From: SeongJae Park @ 2025-10-01 16:43 UTC (permalink / raw)
  To: Jakub Acs
  Cc: SeongJae Park, linux-mm, akpm, david, xu.xin16, chengming.zhou,
	peterx, axelrasmussen, linux-kernel, stable

On Wed, 1 Oct 2025 09:03:52 +0000 Jakub Acs <acsjakub@amazon.de> wrote:

> syzkaller discovered the following crash: (kernel BUG)
> 
> [   44.607039] ------------[ cut here ]------------
> [   44.607422] kernel BUG at mm/userfaultfd.c:2067!
> [   44.608148] Oops: invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
> [   44.608814] CPU: 1 UID: 0 PID: 2475 Comm: reproducer Not tainted 6.16.0-rc6 #1 PREEMPT(none)
> [   44.609635] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
> [   44.610695] RIP: 0010:userfaultfd_release_all+0x3a8/0x460
> 
> <snip other registers, drop unreliable trace>
> 
> [   44.617726] Call Trace:
> [   44.617926]  <TASK>
> [   44.619284]  userfaultfd_release+0xef/0x1b0
> [   44.620976]  __fput+0x3f9/0xb60
> [   44.621240]  fput_close_sync+0x110/0x210
> [   44.622222]  __x64_sys_close+0x8f/0x120
> [   44.622530]  do_syscall_64+0x5b/0x2f0
> [   44.622840]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [   44.623244] RIP: 0033:0x7f365bb3f227
> 
> Kernel panics because it detects UFFD inconsistency during
> userfaultfd_release_all(). Specifically, a VMA which has a valid pointer
> to vma->vm_userfaultfd_ctx, but no UFFD flags in vma->vm_flags.
> 
> The inconsistency is caused in ksm_madvise(): when user calls madvise()
> with MADV_UNMEARGEABLE on a VMA that is registered for UFFD in MINOR
> mode, it accidentally clears all flags stored in the upper 32 bits of
> vma->vm_flags.
> 
> Assuming x86_64 kernel build, unsigned long is 64-bit and unsigned int
> and int are 32-bit wide. This setup causes the following mishap during
> the &= ~VM_MERGEABLE assignment.
> 
> VM_MERGEABLE is a 32-bit constant of type unsigned int, 0x8000'0000.
> After ~ is applied, it becomes 0x7fff'ffff unsigned int, which is then
> promoted to unsigned long before the & operation. This promotion fills
> upper 32 bits with leading 0s, as we're doing unsigned conversion (and
> even for a signed conversion, this wouldn't help as the leading bit is
> 0). & operation thus ends up AND-ing vm_flags with 0x0000'0000'7fff'ffff
> instead of intended 0xffff'ffff'7fff'ffff and hence accidentally clears
> the upper 32-bits of its value.
> 
> Fix it by changing `VM_MERGEABLE` constant to unsigned long, using the
> BIT() macro.

Nice!

> 
> Note: other VM_* flags are not affected:
> This only happens to the VM_MERGEABLE flag, as the other VM_* flags are
> all constants of type int and after ~ operation, they end up with
> leading 1 and are thus converted to unsigned long with leading 1s.
> 
> Note 2:
> After commit 31defc3b01d9 ("userfaultfd: remove (VM_)BUG_ON()s"), this is
> no longer a kernel BUG, but a WARNING at the same place:
> 
> [   45.595973] WARNING: CPU: 1 PID: 2474 at mm/userfaultfd.c:2067
> 
> but the root-cause (flag-drop) remains the same.
> 
> Fixes: 7677f7fd8be76 ("userfaultfd: add minor fault registration mode")

Nit.  It is recommended [1] to use 12 characters of the SHA-1 ID, but you are
using 13 characters.

> Signed-off-by: Jakub Acs <acsjakub@amazon.de>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Xu Xin <xu.xin16@zte.com.cn>
> Cc: Chengming Zhou <chengming.zhou@linux.dev>
> Cc: Peter Xu <peterx@redhat.com>
> Cc: Axel Rasmussen <axelrasmussen@google.com>
> Cc: linux-mm@kvack.org
> Cc: linux-kernel@vger.kernel.org
> Cc: stable@vger.kernel.org

Nit.  This would be nice to be placed just after the 'Fixes:' tag.

Acked-by: SeongJae Park <sj@kernel.org>

[1] https://docs.kernel.org/process/submitting-patches.html#describe-your-changes


Thanks,
SJ

[...]


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 2/2] mm: redefine VM_* flag constants with BIT()
  2025-10-01  9:03 ` [PATCH v3 2/2] mm: redefine VM_* flag constants with BIT() Jakub Acs
  2025-10-01 14:04   ` David Hildenbrand
@ 2025-10-01 16:51   ` SeongJae Park
  2025-10-02  7:29     ` David Hildenbrand
  1 sibling, 1 reply; 14+ messages in thread
From: SeongJae Park @ 2025-10-01 16:51 UTC (permalink / raw)
  To: Jakub Acs
  Cc: SeongJae Park, linux-mm, akpm, david, xu.xin16, chengming.zhou,
	peterx, axelrasmussen, linux-kernel

On Wed, 1 Oct 2025 09:03:53 +0000 Jakub Acs <acsjakub@amazon.de> wrote:

> Make VM_* flag constant definitions consistent - unify all to use BIT()
> macro.
> 
> This is a separete follow-up fix after we changed VM_MERGEABLE
> separately to isolate bugfix for easier backporting. As suggested by
> David in [1]. 
> 
> [1]: https://lore.kernel.org/all/85f852f9-8577-4230-adc7-c52e7f479454@redhat.com/
> 
> Signed-off-by: Jakub Acs <acsjakub@amazon.de>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Xu Xin <xu.xin16@zte.com.cn>
> Cc: Chengming Zhou <chengming.zhou@linux.dev>
> Cc: Peter Xu <peterx@redhat.com>
> Cc: Axel Rasmussen <axelrasmussen@google.com>
> Cc: linux-mm@kvack.org
> Cc: linux-kernel@vger.kernel.org
> ---
>  include/linux/mm.h | 66 +++++++++++++++++++++++-----------------------
>  1 file changed, 33 insertions(+), 33 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index c6794d0e24eb..88cab3d7eea2 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -246,56 +246,56 @@ extern unsigned int kobjsize(const void *objp);
>   * vm_flags in vm_area_struct, see mm_types.h.
>   * When changing, update also include/trace/events/mmflags.h
>   */
> -#define VM_NONE		0x00000000
> +#define VM_NONE		0

I'm wondering if it could be more consistent to use 0UL instead.

>  
> -#define VM_READ		0x00000001	/* currently active flags */
> -#define VM_WRITE	0x00000002
> -#define VM_EXEC		0x00000004
> -#define VM_SHARED	0x00000008
> +#define VM_READ		BIT(0)		/* currently active flags */
> +#define VM_WRITE	BIT(1)
> +#define VM_EXEC		BIT(2)
> +#define VM_SHARED	BIT(3)
>  
>  /* mprotect() hardcodes VM_MAYREAD >> 4 == VM_READ, and so for r/w/x bits. */
> -#define VM_MAYREAD	0x00000010	/* limits for mprotect() etc */
> -#define VM_MAYWRITE	0x00000020
> -#define VM_MAYEXEC	0x00000040
> -#define VM_MAYSHARE	0x00000080
> +#define VM_MAYREAD	BIT(4)		/* limits for mprotect() etc */
> +#define VM_MAYWRITE	BIT(5)
> +#define VM_MAYEXEC	BIT(6)
> +#define VM_MAYSHARE	BIT(7)
>  
> -#define VM_GROWSDOWN	0x00000100	/* general info on the segment */
> +#define VM_GROWSDOWN	BIT(8)		/* general info on the segment */
>  #ifdef CONFIG_MMU
> -#define VM_UFFD_MISSING	0x00000200	/* missing pages tracking */
> +#define VM_UFFD_MISSING	BIT(9)		/* missing pages tracking */
>  #else /* CONFIG_MMU */
> -#define VM_MAYOVERLAY	0x00000200	/* nommu: R/O MAP_PRIVATE mapping that might overlay a file mapping */
> +#define VM_MAYOVERLAY	BIT(10)		/* nommu: R/O MAP_PRIVATE mapping that might overlay a file mapping */

s/10/9/ ?

>  #define VM_UFFD_MISSING	0
>  #endif /* CONFIG_MMU */
> -#define VM_PFNMAP	0x00000400	/* Page-ranges managed without "struct page", just pure PFN */
> -#define VM_UFFD_WP	0x00001000	/* wrprotect pages tracking */
> +#define VM_PFNMAP	BIT(11)		/* Page-ranges managed without "struct page", just pure PFN */

s/11/10/ ?

> +#define VM_UFFD_WP	BIT(12)		/* wrprotect pages tracking */
>  
> -#define VM_LOCKED	0x00002000
> -#define VM_IO           0x00004000	/* Memory mapped I/O or similar */
> +#define VM_LOCKED	BIT(13)
> +#define VM_IO           BIT(14)		/* Memory mapped I/O or similar */
>  
>  					/* Used by sys_madvise() */
> -#define VM_SEQ_READ	0x00008000	/* App will access data sequentially */
> -#define VM_RAND_READ	0x00010000	/* App will not benefit from clustered reads */
> -
> -#define VM_DONTCOPY	0x00020000      /* Do not copy this vma on fork */
> -#define VM_DONTEXPAND	0x00040000	/* Cannot expand with mremap() */
> -#define VM_LOCKONFAULT	0x00080000	/* Lock the pages covered when they are faulted in */
> -#define VM_ACCOUNT	0x00100000	/* Is a VM accounted object */
> -#define VM_NORESERVE	0x00200000	/* should the VM suppress accounting */
> -#define VM_HUGETLB	0x00400000	/* Huge TLB Page VM */
> -#define VM_SYNC		0x00800000	/* Synchronous page faults */
> -#define VM_ARCH_1	0x01000000	/* Architecture-specific flag */
> -#define VM_WIPEONFORK	0x02000000	/* Wipe VMA contents in child. */
> -#define VM_DONTDUMP	0x04000000	/* Do not include in the core dump */
> +#define VM_SEQ_READ	BIT(15)		/* App will access data sequentially */
> +#define VM_RAND_READ	BIT(16)		/* App will not benefit from clustered reads */
> +
> +#define VM_DONTCOPY	BIT(17)		/* Do not copy this vma on fork */
> +#define VM_DONTEXPAND	BIT(18)		/* Cannot expand with mremap() */
> +#define VM_LOCKONFAULT	BIT(19)		/* Lock the pages covered when they are faulted in */
> +#define VM_ACCOUNT	BIT(20)		/* Is a VM accounted object */
> +#define VM_NORESERVE	BIT(21)		/* should the VM suppress accounting */
> +#define VM_HUGETLB	BIT(22)		/* Huge TLB Page VM */
> +#define VM_SYNC		BIT(23)		/* Synchronous page faults */
> +#define VM_ARCH_1	BIT(24)		/* Architecture-specific flag */
> +#define VM_WIPEONFORK	BIT(25)		/* Wipe VMA contents in child. */
> +#define VM_DONTDUMP	BIT(26)		/* Do not include in the core dump */
>  
>  #ifdef CONFIG_MEM_SOFT_DIRTY
> -# define VM_SOFTDIRTY	0x08000000	/* Not soft dirty clean area */
> +# define VM_SOFTDIRTY	BIT(27)		/* Not soft dirty clean area */
>  #else
>  # define VM_SOFTDIRTY	0
>  #endif
>  
> -#define VM_MIXEDMAP	0x10000000	/* Can contain "struct page" and pure PFN pages */
> -#define VM_HUGEPAGE	0x20000000	/* MADV_HUGEPAGE marked this vma */
> -#define VM_NOHUGEPAGE	0x40000000	/* MADV_NOHUGEPAGE marked this vma */
> +#define VM_MIXEDMAP	BIT(28)		/* Can contain "struct page" and pure PFN pages */
> +#define VM_HUGEPAGE	BIT(29)		/* MADV_HUGEPAGE marked this vma */
> +#define VM_NOHUGEPAGE	BIT(30)		/* MADV_NOHUGEPAGE marked this vma */
>  #define VM_MERGEABLE	BIT(31)		/* KSM may merge identical pages */
>  
>  #ifdef CONFIG_ARCH_USES_HIGH_VMA_FLAGS
> -- 
> 2.47.3


Thanks,
SJ

[...]


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 2/2] mm: redefine VM_* flag constants with BIT()
  2025-10-01 16:51   ` SeongJae Park
@ 2025-10-02  7:29     ` David Hildenbrand
  2025-10-02 17:39       ` SeongJae Park
  0 siblings, 1 reply; 14+ messages in thread
From: David Hildenbrand @ 2025-10-02  7:29 UTC (permalink / raw)
  To: SeongJae Park, Jakub Acs
  Cc: linux-mm, akpm, xu.xin16, chengming.zhou, peterx, axelrasmussen,
	linux-kernel

On 01.10.25 18:51, SeongJae Park wrote:
> On Wed, 1 Oct 2025 09:03:53 +0000 Jakub Acs <acsjakub@amazon.de> wrote:
> 
>> Make VM_* flag constant definitions consistent - unify all to use BIT()
>> macro.
>>
>> This is a separete follow-up fix after we changed VM_MERGEABLE
>> separately to isolate bugfix for easier backporting. As suggested by
>> David in [1].
>>
>> [1]: https://lore.kernel.org/all/85f852f9-8577-4230-adc7-c52e7f479454@redhat.com/
>>
>> Signed-off-by: Jakub Acs <acsjakub@amazon.de>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: David Hildenbrand <david@redhat.com>
>> Cc: Xu Xin <xu.xin16@zte.com.cn>
>> Cc: Chengming Zhou <chengming.zhou@linux.dev>
>> Cc: Peter Xu <peterx@redhat.com>
>> Cc: Axel Rasmussen <axelrasmussen@google.com>
>> Cc: linux-mm@kvack.org
>> Cc: linux-kernel@vger.kernel.org
>> ---
>>   include/linux/mm.h | 66 +++++++++++++++++++++++-----------------------
>>   1 file changed, 33 insertions(+), 33 deletions(-)
>>
>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>> index c6794d0e24eb..88cab3d7eea2 100644
>> --- a/include/linux/mm.h
>> +++ b/include/linux/mm.h
>> @@ -246,56 +246,56 @@ extern unsigned int kobjsize(const void *objp);
>>    * vm_flags in vm_area_struct, see mm_types.h.
>>    * When changing, update also include/trace/events/mmflags.h
>>    */
>> -#define VM_NONE		0x00000000
>> +#define VM_NONE		0
> 
> I'm wondering if it could be more consistent to use 0UL instead.

Not really required, and if we're switching to BIT already there is not 
a lot of consistency to be had. Would be different if we were avoid 
BIT() is in patch v2.

-- 
Cheers

David / dhildenb



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 2/2] mm: redefine VM_* flag constants with BIT()
  2025-10-01 14:04   ` David Hildenbrand
@ 2025-10-02  8:03     ` Jakub Acs
  0 siblings, 0 replies; 14+ messages in thread
From: Jakub Acs @ 2025-10-02  8:03 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, akpm, xu.xin16, chengming.zhou, peterx, axelrasmussen,
	linux-kernel

On Wed, Oct 01, 2025 at 04:04:51PM +0200, David Hildenbrand wrote:
> >-#define VM_GROWSDOWN	0x00000100	/* general info on the segment */
> >+#define VM_GROWSDOWN	BIT(8)		/* general info on the segment */
> >  #ifdef CONFIG_MMU
> >-#define VM_UFFD_MISSING	0x00000200	/* missing pages tracking */
> >+#define VM_UFFD_MISSING	BIT(9)		/* missing pages tracking */
> >  #else /* CONFIG_MMU */
> >-#define VM_MAYOVERLAY	0x00000200	/* nommu: R/O MAP_PRIVATE mapping that might overlay a file mapping */
> >+#define VM_MAYOVERLAY	BIT(10)		/* nommu: R/O MAP_PRIVATE mapping that might overlay a file mapping */
> 
> Careful: VM_UFFD_MISSING and VM_MAYOVERLAY share the same bit,
> depending on CONFIG_MMU (9).
> 
> >  #define VM_UFFD_MISSING	0
> >  #endif /* CONFIG_MMU */
> >-#define VM_PFNMAP	0x00000400	/* Page-ranges managed without "struct page", just pure PFN */
> >-#define VM_UFFD_WP	0x00001000	/* wrprotect pages tracking */
> >+#define VM_PFNMAP	BIT(11)		/* Page-ranges managed without "struct page", just pure PFN */
> 
> -> 10

Ugh, thanks for catching these!

> 
> 11 is actually unused IIUC.
> 
> >+#define VM_UFFD_WP	BIT(12)		/* wrprotect pages tracking */
> 
> This seems to be correct again.
> 
> 
> IIRC, Andrew prefers not mixing fixes and cleanups in the same
> series if possible. So you might just want to send out patch #1
> separately and, send out patch #2 separately with a note under the
> --- that it depends on patch #1.
> 

I saw that patch #1 was applied so will leave that alone now, but took a
note for future.

For completeness adding the link to v4:
https://lore.kernel.org/all/20251002075202.11306-1-acsjakub@amazon.de/

Many Thanks,
Jakub



Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christian Schlaeger
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 2/2] mm: redefine VM_* flag constants with BIT()
  2025-10-02  7:29     ` David Hildenbrand
@ 2025-10-02 17:39       ` SeongJae Park
  0 siblings, 0 replies; 14+ messages in thread
From: SeongJae Park @ 2025-10-02 17:39 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: SeongJae Park, Jakub Acs, linux-mm, akpm, xu.xin16,
	chengming.zhou, peterx, axelrasmussen, linux-kernel

On Thu, 2 Oct 2025 09:29:37 +0200 David Hildenbrand <david@redhat.com> wrote:

> On 01.10.25 18:51, SeongJae Park wrote:
> > On Wed, 1 Oct 2025 09:03:53 +0000 Jakub Acs <acsjakub@amazon.de> wrote:
> > 
> >> Make VM_* flag constant definitions consistent - unify all to use BIT()
> >> macro.
> >>
> >> This is a separete follow-up fix after we changed VM_MERGEABLE
> >> separately to isolate bugfix for easier backporting. As suggested by
> >> David in [1].
> >>
> >> [1]: https://lore.kernel.org/all/85f852f9-8577-4230-adc7-c52e7f479454@redhat.com/
> >>
> >> Signed-off-by: Jakub Acs <acsjakub@amazon.de>
> >> Cc: Andrew Morton <akpm@linux-foundation.org>
> >> Cc: David Hildenbrand <david@redhat.com>
> >> Cc: Xu Xin <xu.xin16@zte.com.cn>
> >> Cc: Chengming Zhou <chengming.zhou@linux.dev>
> >> Cc: Peter Xu <peterx@redhat.com>
> >> Cc: Axel Rasmussen <axelrasmussen@google.com>
> >> Cc: linux-mm@kvack.org
> >> Cc: linux-kernel@vger.kernel.org
> >> ---
> >>   include/linux/mm.h | 66 +++++++++++++++++++++++-----------------------
> >>   1 file changed, 33 insertions(+), 33 deletions(-)
> >>
> >> diff --git a/include/linux/mm.h b/include/linux/mm.h
> >> index c6794d0e24eb..88cab3d7eea2 100644
> >> --- a/include/linux/mm.h
> >> +++ b/include/linux/mm.h
> >> @@ -246,56 +246,56 @@ extern unsigned int kobjsize(const void *objp);
> >>    * vm_flags in vm_area_struct, see mm_types.h.
> >>    * When changing, update also include/trace/events/mmflags.h
> >>    */
> >> -#define VM_NONE		0x00000000
> >> +#define VM_NONE		0
> > 
> > I'm wondering if it could be more consistent to use 0UL instead.
> 
> Not really required, and if we're switching to BIT already there is not 
> a lot of consistency to be had. Would be different if we were avoid 
> BIT() is in patch v2.

Agreed. :)


Thanks,
SJ

[...]


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 1/2] mm/ksm: fix flag-dropping behavior in ksm_madvise
  2025-10-01  9:03 ` [PATCH v3 1/2] mm/ksm: fix flag-dropping behavior in ksm_madvise Jakub Acs
  2025-10-01 14:06   ` David Hildenbrand
  2025-10-01 16:43   ` SeongJae Park
@ 2025-11-06 10:39   ` Vlastimil Babka
  2025-11-06 11:16     ` David Hildenbrand (Red Hat)
                       ` (2 more replies)
  2 siblings, 3 replies; 14+ messages in thread
From: Vlastimil Babka @ 2025-11-06 10:39 UTC (permalink / raw)
  To: Jakub Acs, linux-mm, Hugh Dickins, Jann Horn, Lorenzo Stoakes
  Cc: akpm, david, xu.xin16, chengming.zhou, peterx, axelrasmussen,
	linux-kernel, stable

On 10/1/25 11:03, Jakub Acs wrote:
> syzkaller discovered the following crash: (kernel BUG)
> 
> [   44.607039] ------------[ cut here ]------------
> [   44.607422] kernel BUG at mm/userfaultfd.c:2067!
> [   44.608148] Oops: invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
> [   44.608814] CPU: 1 UID: 0 PID: 2475 Comm: reproducer Not tainted 6.16.0-rc6 #1 PREEMPT(none)
> [   44.609635] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
> [   44.610695] RIP: 0010:userfaultfd_release_all+0x3a8/0x460
> 
> <snip other registers, drop unreliable trace>
> 
> [   44.617726] Call Trace:
> [   44.617926]  <TASK>
> [   44.619284]  userfaultfd_release+0xef/0x1b0
> [   44.620976]  __fput+0x3f9/0xb60
> [   44.621240]  fput_close_sync+0x110/0x210
> [   44.622222]  __x64_sys_close+0x8f/0x120
> [   44.622530]  do_syscall_64+0x5b/0x2f0
> [   44.622840]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [   44.623244] RIP: 0033:0x7f365bb3f227
> 
> Kernel panics because it detects UFFD inconsistency during
> userfaultfd_release_all(). Specifically, a VMA which has a valid pointer
> to vma->vm_userfaultfd_ctx, but no UFFD flags in vma->vm_flags.
> 
> The inconsistency is caused in ksm_madvise(): when user calls madvise()
> with MADV_UNMEARGEABLE on a VMA that is registered for UFFD in MINOR
> mode, it accidentally clears all flags stored in the upper 32 bits of
> vma->vm_flags.
> 
> Assuming x86_64 kernel build, unsigned long is 64-bit and unsigned int
> and int are 32-bit wide. This setup causes the following mishap during
> the &= ~VM_MERGEABLE assignment.
> 
> VM_MERGEABLE is a 32-bit constant of type unsigned int, 0x8000'0000.
> After ~ is applied, it becomes 0x7fff'ffff unsigned int, which is then
> promoted to unsigned long before the & operation. This promotion fills
> upper 32 bits with leading 0s, as we're doing unsigned conversion (and
> even for a signed conversion, this wouldn't help as the leading bit is
> 0). & operation thus ends up AND-ing vm_flags with 0x0000'0000'7fff'ffff
> instead of intended 0xffff'ffff'7fff'ffff and hence accidentally clears
> the upper 32-bits of its value.
> 
> Fix it by changing `VM_MERGEABLE` constant to unsigned long, using the
> BIT() macro.
> 
> Note: other VM_* flags are not affected:
> This only happens to the VM_MERGEABLE flag, as the other VM_* flags are
> all constants of type int and after ~ operation, they end up with
> leading 1 and are thus converted to unsigned long with leading 1s.
> 
> Note 2:
> After commit 31defc3b01d9 ("userfaultfd: remove (VM_)BUG_ON()s"), this is
> no longer a kernel BUG, but a WARNING at the same place:
> 
> [   45.595973] WARNING: CPU: 1 PID: 2474 at mm/userfaultfd.c:2067
> 
> but the root-cause (flag-drop) remains the same.
> 
> Fixes: 7677f7fd8be76 ("userfaultfd: add minor fault registration mode")

Late to the party, but it seems to me the correct Fixes: should be
f8af4da3b4c1 ("ksm: the mm interface to ksm")
which introduced the flag and the buggy clearing code, no?

Commit 7677f7fd8be76 is just one that notices it, right? But there are other
flags in >32 bit area, including pkeys etc. Sounds rather dangerous if they
can be cleared using a madvise.

So we can't amend the Fixes: now but maybe could advise stable to backport
for even older versions than based on 7677f7fd8be76 ?

> Signed-off-by: Jakub Acs <acsjakub@amazon.de>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Xu Xin <xu.xin16@zte.com.cn>
> Cc: Chengming Zhou <chengming.zhou@linux.dev>
> Cc: Peter Xu <peterx@redhat.com>
> Cc: Axel Rasmussen <axelrasmussen@google.com>
> Cc: linux-mm@kvack.org
> Cc: linux-kernel@vger.kernel.org
> Cc: stable@vger.kernel.org
> ---
>  include/linux/mm.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 1ae97a0b8ec7..c6794d0e24eb 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -296,7 +296,7 @@ extern unsigned int kobjsize(const void *objp);
>  #define VM_MIXEDMAP	0x10000000	/* Can contain "struct page" and pure PFN pages */
>  #define VM_HUGEPAGE	0x20000000	/* MADV_HUGEPAGE marked this vma */
>  #define VM_NOHUGEPAGE	0x40000000	/* MADV_NOHUGEPAGE marked this vma */
> -#define VM_MERGEABLE	0x80000000	/* KSM may merge identical pages */
> +#define VM_MERGEABLE	BIT(31)		/* KSM may merge identical pages */
>  
>  #ifdef CONFIG_ARCH_USES_HIGH_VMA_FLAGS
>  #define VM_HIGH_ARCH_BIT_0	32	/* bit only usable on 64-bit architectures */



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 1/2] mm/ksm: fix flag-dropping behavior in ksm_madvise
  2025-11-06 10:39   ` Vlastimil Babka
@ 2025-11-06 11:16     ` David Hildenbrand (Red Hat)
  2025-11-07  9:49     ` Jakub Acs
  2025-11-10 10:00     ` Vlastimil Babka
  2 siblings, 0 replies; 14+ messages in thread
From: David Hildenbrand (Red Hat) @ 2025-11-06 11:16 UTC (permalink / raw)
  To: Vlastimil Babka, Jakub Acs, linux-mm, Hugh Dickins, Jann Horn,
	Lorenzo Stoakes
  Cc: akpm, xu.xin16, chengming.zhou, peterx, axelrasmussen,
	linux-kernel, stable

On 06.11.25 11:39, Vlastimil Babka wrote:
> On 10/1/25 11:03, Jakub Acs wrote:
>> syzkaller discovered the following crash: (kernel BUG)
>>
>> [   44.607039] ------------[ cut here ]------------
>> [   44.607422] kernel BUG at mm/userfaultfd.c:2067!
>> [   44.608148] Oops: invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
>> [   44.608814] CPU: 1 UID: 0 PID: 2475 Comm: reproducer Not tainted 6.16.0-rc6 #1 PREEMPT(none)
>> [   44.609635] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
>> [   44.610695] RIP: 0010:userfaultfd_release_all+0x3a8/0x460
>>
>> <snip other registers, drop unreliable trace>
>>
>> [   44.617726] Call Trace:
>> [   44.617926]  <TASK>
>> [   44.619284]  userfaultfd_release+0xef/0x1b0
>> [   44.620976]  __fput+0x3f9/0xb60
>> [   44.621240]  fput_close_sync+0x110/0x210
>> [   44.622222]  __x64_sys_close+0x8f/0x120
>> [   44.622530]  do_syscall_64+0x5b/0x2f0
>> [   44.622840]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
>> [   44.623244] RIP: 0033:0x7f365bb3f227
>>
>> Kernel panics because it detects UFFD inconsistency during
>> userfaultfd_release_all(). Specifically, a VMA which has a valid pointer
>> to vma->vm_userfaultfd_ctx, but no UFFD flags in vma->vm_flags.
>>
>> The inconsistency is caused in ksm_madvise(): when user calls madvise()
>> with MADV_UNMEARGEABLE on a VMA that is registered for UFFD in MINOR
>> mode, it accidentally clears all flags stored in the upper 32 bits of
>> vma->vm_flags.
>>
>> Assuming x86_64 kernel build, unsigned long is 64-bit and unsigned int
>> and int are 32-bit wide. This setup causes the following mishap during
>> the &= ~VM_MERGEABLE assignment.
>>
>> VM_MERGEABLE is a 32-bit constant of type unsigned int, 0x8000'0000.
>> After ~ is applied, it becomes 0x7fff'ffff unsigned int, which is then
>> promoted to unsigned long before the & operation. This promotion fills
>> upper 32 bits with leading 0s, as we're doing unsigned conversion (and
>> even for a signed conversion, this wouldn't help as the leading bit is
>> 0). & operation thus ends up AND-ing vm_flags with 0x0000'0000'7fff'ffff
>> instead of intended 0xffff'ffff'7fff'ffff and hence accidentally clears
>> the upper 32-bits of its value.
>>
>> Fix it by changing `VM_MERGEABLE` constant to unsigned long, using the
>> BIT() macro.
>>
>> Note: other VM_* flags are not affected:
>> This only happens to the VM_MERGEABLE flag, as the other VM_* flags are
>> all constants of type int and after ~ operation, they end up with
>> leading 1 and are thus converted to unsigned long with leading 1s.
>>
>> Note 2:
>> After commit 31defc3b01d9 ("userfaultfd: remove (VM_)BUG_ON()s"), this is
>> no longer a kernel BUG, but a WARNING at the same place:
>>
>> [   45.595973] WARNING: CPU: 1 PID: 2474 at mm/userfaultfd.c:2067
>>
>> but the root-cause (flag-drop) remains the same.
>>
>> Fixes: 7677f7fd8be76 ("userfaultfd: add minor fault registration mode")
> 
> Late to the party, but it seems to me the correct Fixes: should be
> f8af4da3b4c1 ("ksm: the mm interface to ksm")
> which introduced the flag and the buggy clearing code, no?
> 
> Commit 7677f7fd8be76 is just one that notices it, right? But there are other
> flags in >32 bit area, including pkeys etc. Sounds rather dangerous if they
> can be cleared using a madvise.
> 
> So we can't amend the Fixes: now but maybe could advise stable to backport
> for even older versions than based on 7677f7fd8be76 ?

Yes, I agree.

-- 
Cheers

David


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 1/2] mm/ksm: fix flag-dropping behavior in ksm_madvise
  2025-11-06 10:39   ` Vlastimil Babka
  2025-11-06 11:16     ` David Hildenbrand (Red Hat)
@ 2025-11-07  9:49     ` Jakub Acs
  2025-11-10 10:00     ` Vlastimil Babka
  2 siblings, 0 replies; 14+ messages in thread
From: Jakub Acs @ 2025-11-07  9:49 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Hugh Dickins, Jann Horn, Lorenzo Stoakes, akpm, david,
	xu.xin16, chengming.zhou, peterx, axelrasmussen, linux-kernel,
	stable

On Thu, Nov 06, 2025 at 11:39:28AM +0100, Vlastimil Babka wrote:
> On 10/1/25 11:03, Jakub Acs wrote:
> > syzkaller discovered the following crash: (kernel BUG)
> > 
> > [   44.607039] ------------[ cut here ]------------
> > [   44.607422] kernel BUG at mm/userfaultfd.c:2067!
> > [   44.608148] Oops: invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
> > [   44.608814] CPU: 1 UID: 0 PID: 2475 Comm: reproducer Not tainted 6.16.0-rc6 #1 PREEMPT(none)
> > [   44.609635] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
> > [   44.610695] RIP: 0010:userfaultfd_release_all+0x3a8/0x460
> > 
> > <snip other registers, drop unreliable trace>
> > 
> > [   44.617726] Call Trace:
> > [   44.617926]  <TASK>
> > [   44.619284]  userfaultfd_release+0xef/0x1b0
> > [   44.620976]  __fput+0x3f9/0xb60
> > [   44.621240]  fput_close_sync+0x110/0x210
> > [   44.622222]  __x64_sys_close+0x8f/0x120
> > [   44.622530]  do_syscall_64+0x5b/0x2f0
> > [   44.622840]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > [   44.623244] RIP: 0033:0x7f365bb3f227
> > 
> > Kernel panics because it detects UFFD inconsistency during
> > userfaultfd_release_all(). Specifically, a VMA which has a valid pointer
> > to vma->vm_userfaultfd_ctx, but no UFFD flags in vma->vm_flags.
> > 
> > The inconsistency is caused in ksm_madvise(): when user calls madvise()
> > with MADV_UNMEARGEABLE on a VMA that is registered for UFFD in MINOR
> > mode, it accidentally clears all flags stored in the upper 32 bits of
> > vma->vm_flags.
> > 
> > Assuming x86_64 kernel build, unsigned long is 64-bit and unsigned int
> > and int are 32-bit wide. This setup causes the following mishap during
> > the &= ~VM_MERGEABLE assignment.
> > 
> > VM_MERGEABLE is a 32-bit constant of type unsigned int, 0x8000'0000.
> > After ~ is applied, it becomes 0x7fff'ffff unsigned int, which is then
> > promoted to unsigned long before the & operation. This promotion fills
> > upper 32 bits with leading 0s, as we're doing unsigned conversion (and
> > even for a signed conversion, this wouldn't help as the leading bit is
> > 0). & operation thus ends up AND-ing vm_flags with 0x0000'0000'7fff'ffff
> > instead of intended 0xffff'ffff'7fff'ffff and hence accidentally clears
> > the upper 32-bits of its value.
> > 
> > Fix it by changing `VM_MERGEABLE` constant to unsigned long, using the
> > BIT() macro.
> > 
> > Note: other VM_* flags are not affected:
> > This only happens to the VM_MERGEABLE flag, as the other VM_* flags are
> > all constants of type int and after ~ operation, they end up with
> > leading 1 and are thus converted to unsigned long with leading 1s.
> > 
> > Note 2:
> > After commit 31defc3b01d9 ("userfaultfd: remove (VM_)BUG_ON()s"), this is
> > no longer a kernel BUG, but a WARNING at the same place:
> > 
> > [   45.595973] WARNING: CPU: 1 PID: 2474 at mm/userfaultfd.c:2067
> > 
> > but the root-cause (flag-drop) remains the same.
> > 
> > Fixes: 7677f7fd8be76 ("userfaultfd: add minor fault registration mode")
> 
> Late to the party, but it seems to me the correct Fixes: should be
> f8af4da3b4c1 ("ksm: the mm interface to ksm")
> which introduced the flag and the buggy clearing code, no?
> 
> Commit 7677f7fd8be76 is just one that notices it, right? But there are other
> flags in >32 bit area, including pkeys etc. Sounds rather dangerous if they
> can be cleared using a madvise.
> 
> So we can't amend the Fixes: now but maybe could advise stable to backport
> for even older versions than based on 7677f7fd8be76 ?
> 

Good point. It was a bit tricky to determine the correct "fixes" tag, as
there were more candidates:
- the commit that initially introduced VM_MERGEABLE as a constant with
  different inferred type to other vm_flags constants
- the commit that first started using upper 32 bits of vm_flags and did
  not make sure the constants are defined safely
- f8af4da3b4c1 indeed, as the one that makes the drop actually possible
- 7677f7fd8be76 that shows us a path where the drop manifests

Looking back, I agree f8af4da3b4c1 is the better option, but as you
said, that won't be changed now.

Nevertheless, I'll send the backports after a round of kselftests,
thanks for pointing this out.

Have a good day,
Jakub
 



Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Christof Hellmis
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 1/2] mm/ksm: fix flag-dropping behavior in ksm_madvise
  2025-11-06 10:39   ` Vlastimil Babka
  2025-11-06 11:16     ` David Hildenbrand (Red Hat)
  2025-11-07  9:49     ` Jakub Acs
@ 2025-11-10 10:00     ` Vlastimil Babka
  2 siblings, 0 replies; 14+ messages in thread
From: Vlastimil Babka @ 2025-11-10 10:00 UTC (permalink / raw)
  To: Jakub Acs, linux-mm, Hugh Dickins, Jann Horn, Lorenzo Stoakes,
	Dave Hansen
  Cc: akpm, david, xu.xin16, chengming.zhou, peterx, axelrasmussen,
	linux-kernel, stable

On 11/6/25 11:39, Vlastimil Babka wrote:
> On 10/1/25 11:03, Jakub Acs wrote:
>> syzkaller discovered the following crash: (kernel BUG)
>> 
>> [   44.607039] ------------[ cut here ]------------
>> [   44.607422] kernel BUG at mm/userfaultfd.c:2067!
>> [   44.608148] Oops: invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
>> [   44.608814] CPU: 1 UID: 0 PID: 2475 Comm: reproducer Not tainted 6.16.0-rc6 #1 PREEMPT(none)
>> [   44.609635] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
>> [   44.610695] RIP: 0010:userfaultfd_release_all+0x3a8/0x460
>> 
>> <snip other registers, drop unreliable trace>
>> 
>> [   44.617726] Call Trace:
>> [   44.617926]  <TASK>
>> [   44.619284]  userfaultfd_release+0xef/0x1b0
>> [   44.620976]  __fput+0x3f9/0xb60
>> [   44.621240]  fput_close_sync+0x110/0x210
>> [   44.622222]  __x64_sys_close+0x8f/0x120
>> [   44.622530]  do_syscall_64+0x5b/0x2f0
>> [   44.622840]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
>> [   44.623244] RIP: 0033:0x7f365bb3f227
>> 
>> Kernel panics because it detects UFFD inconsistency during
>> userfaultfd_release_all(). Specifically, a VMA which has a valid pointer
>> to vma->vm_userfaultfd_ctx, but no UFFD flags in vma->vm_flags.
>> 
>> The inconsistency is caused in ksm_madvise(): when user calls madvise()
>> with MADV_UNMEARGEABLE on a VMA that is registered for UFFD in MINOR
>> mode, it accidentally clears all flags stored in the upper 32 bits of
>> vma->vm_flags.
>> 
>> Assuming x86_64 kernel build, unsigned long is 64-bit and unsigned int
>> and int are 32-bit wide. This setup causes the following mishap during
>> the &= ~VM_MERGEABLE assignment.
>> 
>> VM_MERGEABLE is a 32-bit constant of type unsigned int, 0x8000'0000.
>> After ~ is applied, it becomes 0x7fff'ffff unsigned int, which is then
>> promoted to unsigned long before the & operation. This promotion fills
>> upper 32 bits with leading 0s, as we're doing unsigned conversion (and
>> even for a signed conversion, this wouldn't help as the leading bit is
>> 0). & operation thus ends up AND-ing vm_flags with 0x0000'0000'7fff'ffff
>> instead of intended 0xffff'ffff'7fff'ffff and hence accidentally clears
>> the upper 32-bits of its value.
>> 
>> Fix it by changing `VM_MERGEABLE` constant to unsigned long, using the
>> BIT() macro.
>> 
>> Note: other VM_* flags are not affected:
>> This only happens to the VM_MERGEABLE flag, as the other VM_* flags are
>> all constants of type int and after ~ operation, they end up with
>> leading 1 and are thus converted to unsigned long with leading 1s.
>> 
>> Note 2:
>> After commit 31defc3b01d9 ("userfaultfd: remove (VM_)BUG_ON()s"), this is
>> no longer a kernel BUG, but a WARNING at the same place:
>> 
>> [   45.595973] WARNING: CPU: 1 PID: 2474 at mm/userfaultfd.c:2067
>> 
>> but the root-cause (flag-drop) remains the same.
>> 
>> Fixes: 7677f7fd8be76 ("userfaultfd: add minor fault registration mode")
> 
> Late to the party, but it seems to me the correct Fixes: should be
> f8af4da3b4c1 ("ksm: the mm interface to ksm")
> which introduced the flag and the buggy clearing code, no?

Clarification: flags with bits >31 did not exist at the time of f8af4da3b4c1
as they were only introduced later with 63c17fb8e5a4 ("mm/core,
x86/mm/pkeys: Store protection bits in high VMA flags") (v4.6) so that would
have been the most precise Fixes: commit. Sorry, Hugh :)

But that doesn't affect the stable backports efforts where the oldest LTS is
5.4 anyway.

> Commit 7677f7fd8be76 is just one that notices it, right? But there are other
> flags in >32 bit area, including pkeys etc. Sounds rather dangerous if they
> can be cleared using a madvise.
> 
> So we can't amend the Fixes: now but maybe could advise stable to backport
> for even older versions than based on 7677f7fd8be76 ?
> 
>> Signed-off-by: Jakub Acs <acsjakub@amazon.de>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: David Hildenbrand <david@redhat.com>
>> Cc: Xu Xin <xu.xin16@zte.com.cn>
>> Cc: Chengming Zhou <chengming.zhou@linux.dev>
>> Cc: Peter Xu <peterx@redhat.com>
>> Cc: Axel Rasmussen <axelrasmussen@google.com>
>> Cc: linux-mm@kvack.org
>> Cc: linux-kernel@vger.kernel.org
>> Cc: stable@vger.kernel.org
>> ---
>>  include/linux/mm.h | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>> index 1ae97a0b8ec7..c6794d0e24eb 100644
>> --- a/include/linux/mm.h
>> +++ b/include/linux/mm.h
>> @@ -296,7 +296,7 @@ extern unsigned int kobjsize(const void *objp);
>>  #define VM_MIXEDMAP	0x10000000	/* Can contain "struct page" and pure PFN pages */
>>  #define VM_HUGEPAGE	0x20000000	/* MADV_HUGEPAGE marked this vma */
>>  #define VM_NOHUGEPAGE	0x40000000	/* MADV_NOHUGEPAGE marked this vma */
>> -#define VM_MERGEABLE	0x80000000	/* KSM may merge identical pages */
>> +#define VM_MERGEABLE	BIT(31)		/* KSM may merge identical pages */
>>  
>>  #ifdef CONFIG_ARCH_USES_HIGH_VMA_FLAGS
>>  #define VM_HIGH_ARCH_BIT_0	32	/* bit only usable on 64-bit architectures */
> 



^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2025-11-10 10:00 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-10-01  9:03 [PATCH v3 0/2] mm, ksm: fix flag-dropping behavior Jakub Acs
2025-10-01  9:03 ` [PATCH v3 1/2] mm/ksm: fix flag-dropping behavior in ksm_madvise Jakub Acs
2025-10-01 14:06   ` David Hildenbrand
2025-10-01 16:43   ` SeongJae Park
2025-11-06 10:39   ` Vlastimil Babka
2025-11-06 11:16     ` David Hildenbrand (Red Hat)
2025-11-07  9:49     ` Jakub Acs
2025-11-10 10:00     ` Vlastimil Babka
2025-10-01  9:03 ` [PATCH v3 2/2] mm: redefine VM_* flag constants with BIT() Jakub Acs
2025-10-01 14:04   ` David Hildenbrand
2025-10-02  8:03     ` Jakub Acs
2025-10-01 16:51   ` SeongJae Park
2025-10-02  7:29     ` David Hildenbrand
2025-10-02 17:39       ` SeongJae Park

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox