* [syzbot] [mm?] kernel BUG in __khugepaged_enter @ 2026-02-14 16:40 syzbot 2026-02-16 14:40 ` David Hildenbrand (Arm) 0 siblings, 1 reply; 5+ messages in thread From: syzbot @ 2026-02-14 16:40 UTC (permalink / raw) To: Liam.Howlett, akpm, baohua, baolin.wang, david, dev.jain, lance.yang, linux-kernel, linux-mm, lorenzo.stoakes, npache, ryan.roberts, syzkaller-bugs, ziy Hello, syzbot found the following issue on: HEAD commit: 1e83ccd5921a sched/mmcid: Don't assume CID is CPU owned on.. git tree: upstream console output: https://syzkaller.appspot.com/x/log.txt?x=1169dae6580000 kernel config: https://syzkaller.appspot.com/x/.config?x=54ae71b284dd0e13 dashboard link: https://syzkaller.appspot.com/bug?extid=6b554d491efbe066b701 compiler: gcc (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44 Unfortunately, I don't have any reproducer for this issue yet. Downloadable assets: disk image: https://storage.googleapis.com/syzbot-assets/ed43f42e3ea1/disk-1e83ccd5.raw.xz vmlinux: https://storage.googleapis.com/syzbot-assets/d8af54a32588/vmlinux-1e83ccd5.xz kernel image: https://storage.googleapis.com/syzbot-assets/34e6a8cc1037/bzImage-1e83ccd5.xz IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+6b554d491efbe066b701@syzkaller.appspotmail.com ------------[ cut here ]------------ kernel BUG at mm/khugepaged.c:438! Oops: invalid opcode: 0000 [#1] SMP KASAN PTI CPU: 0 UID: 0 PID: 16472 Comm: syz.3.2372 Tainted: G U W L XTNJ syzkaller #0 PREEMPT(full) Tainted: [U]=USER, [W]=WARN, [L]=SOFTLOCKUP, [X]=AUX, [T]=RANDSTRUCT, [N]=TEST, [J]=FWCTL Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/24/2026 RIP: 0010:__khugepaged_enter+0x30a/0x380 mm/khugepaged.c:438 Code: 64 7e 8e e8 a8 dc 66 ff e8 93 e6 8d ff 5b 5d 41 5c 41 5d 41 5e 41 5f e9 04 6c 04 09 e8 7f e6 8d ff 48 89 df e8 17 33 d9 ff 90 <0f> 0b 48 89 ef e8 dc 51 f8 ff e9 3b fd ff ff e8 f2 52 f8 ff e9 e1 RSP: 0018:ffffc9000e98fba8 EFLAGS: 00010292 RAX: 000000000000031f RBX: ffff888079b24980 RCX: 0000000000000000 RDX: 000000000000031f RSI: ffffffff81e5b2c9 RDI: fffff52001d31f1c RBP: 0000000000000000 R08: 0000000000000005 R09: 0000000000000000 R10: 0000000080000000 R11: 0000000000000001 R12: 0000000008100177 R13: ffff88804adf9510 R14: 0000000000000000 R15: 0000000000000000 FS: 00007f06093436c0(0000) GS:ffff8881245b1000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fff341d3f52 CR3: 00000000319b0000 CR4: 00000000003526f0 Call Trace: <TASK> khugepaged_enter_vma mm/khugepaged.c:467 [inline] khugepaged_enter_vma+0x137/0x2c0 mm/khugepaged.c:461 do_huge_pmd_anonymous_page+0x1c8/0x1c00 mm/huge_memory.c:1469 create_huge_pmd mm/memory.c:6102 [inline] __handle_mm_fault+0x1e96/0x2b50 mm/memory.c:6376 handle_mm_fault+0x36d/0xa20 mm/memory.c:6583 do_user_addr_fault+0x5a3/0x12f0 arch/x86/mm/fault.c:1334 handle_page_fault arch/x86/mm/fault.c:1474 [inline] exc_page_fault+0x6f/0xd0 arch/x86/mm/fault.c:1527 asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:618 RIP: 0033:0x87560 Code: Unable to access opcode bytes at 0x87536. RSP: 002b:000000000000000e EFLAGS: 00010202 RAX: 0000000000000000 RBX: 00007f0608615fa0 RCX: 00007f060839bf79 RDX: 0000000000000000 RSI: 0000000000000006 RDI: 0002000020003b4a RBP: 00007f06084327e0 R08: 0000000000000103 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 00007f0608616038 R14: 00007f0608615fa0 R15: 00007ffee482e7a8 </TASK> Modules linked in: ---[ end trace 0000000000000000 ]--- RIP: 0010:__khugepaged_enter+0x30a/0x380 mm/khugepaged.c:438 Code: 64 7e 8e e8 a8 dc 66 ff e8 93 e6 8d ff 5b 5d 41 5c 41 5d 41 5e 41 5f e9 04 6c 04 09 e8 7f e6 8d ff 48 89 df e8 17 33 d9 ff 90 <0f> 0b 48 89 ef e8 dc 51 f8 ff e9 3b fd ff ff e8 f2 52 f8 ff e9 e1 RSP: 0018:ffffc9000e98fba8 EFLAGS: 00010292 RAX: 000000000000031f RBX: ffff888079b24980 RCX: 0000000000000000 RDX: 000000000000031f RSI: ffffffff81e5b2c9 RDI: fffff52001d31f1c RBP: 0000000000000000 R08: 0000000000000005 R09: 0000000000000000 R10: 0000000080000000 R11: 0000000000000001 R12: 0000000008100177 R13: ffff88804adf9510 R14: 0000000000000000 R15: 0000000000000000 FS: 00007f06093436c0(0000) GS:ffff8881245b1000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055906703f168 CR3: 00000000319b0000 CR4: 00000000003526f0 --- This report is generated by a bot. It may contain errors. See https://goo.gl/tpsmEJ for more information about syzbot. syzbot engineers can be reached at syzkaller@googlegroups.com. syzbot will keep track of this issue. See: https://goo.gl/tpsmEJ#status for how to communicate with syzbot. If the report is already addressed, let syzbot know by replying with: #syz fix: exact-commit-title If you want to overwrite report's subsystems, reply with: #syz set subsystems: new-subsystem (See the list of subsystem names on the web dashboard) If the report is a duplicate of another one, reply with: #syz dup: exact-subject-of-another-report If you want to undo deduplication, reply with: #syz undup ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [syzbot] [mm?] kernel BUG in __khugepaged_enter 2026-02-14 16:40 [syzbot] [mm?] kernel BUG in __khugepaged_enter syzbot @ 2026-02-16 14:40 ` David Hildenbrand (Arm) 2026-02-16 14:43 ` Lorenzo Stoakes 0 siblings, 1 reply; 5+ messages in thread From: David Hildenbrand (Arm) @ 2026-02-16 14:40 UTC (permalink / raw) To: syzbot, Liam.Howlett, akpm, baohua, baolin.wang, dev.jain, lance.yang, linux-kernel, linux-mm, lorenzo.stoakes, npache, ryan.roberts, syzkaller-bugs, ziy On 2/14/26 17:40, syzbot wrote: > Hello, > > syzbot found the following issue on: > > HEAD commit: 1e83ccd5921a sched/mmcid: Don't assume CID is CPU owned on.. > git tree: upstream > console output: https://syzkaller.appspot.com/x/log.txt?x=1169dae6580000 > kernel config: https://syzkaller.appspot.com/x/.config?x=54ae71b284dd0e13 > dashboard link: https://syzkaller.appspot.com/bug?extid=6b554d491efbe066b701 > compiler: gcc (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44 > > Unfortunately, I don't have any reproducer for this issue yet. > > Downloadable assets: > disk image: https://storage.googleapis.com/syzbot-assets/ed43f42e3ea1/disk-1e83ccd5.raw.xz > vmlinux: https://storage.googleapis.com/syzbot-assets/d8af54a32588/vmlinux-1e83ccd5.xz > kernel image: https://storage.googleapis.com/syzbot-assets/34e6a8cc1037/bzImage-1e83ccd5.xz > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > Reported-by: syzbot+6b554d491efbe066b701@syzkaller.appspotmail.com > > ------------[ cut here ]------------ > kernel BUG at mm/khugepaged.c:438! > Oops: invalid opcode: 0000 [#1] SMP KASAN PTI > CPU: 0 UID: 0 PID: 16472 Comm: syz.3.2372 Tainted: G U W L XTNJ syzkaller #0 PREEMPT(full) > Tainted: [U]=USER, [W]=WARN, [L]=SOFTLOCKUP, [X]=AUX, [T]=RANDSTRUCT, [N]=TEST, [J]=FWCTL > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/24/2026 > RIP: 0010:__khugepaged_enter+0x30a/0x380 mm/khugepaged.c:438 > Code: 64 7e 8e e8 a8 dc 66 ff e8 93 e6 8d ff 5b 5d 41 5c 41 5d 41 5e 41 5f e9 04 6c 04 09 e8 7f e6 8d ff 48 89 df e8 17 33 d9 ff 90 <0f> 0b 48 89 ef e8 dc 51 f8 ff e9 3b fd ff ff e8 f2 52 f8 ff e9 e1 > RSP: 0018:ffffc9000e98fba8 EFLAGS: 00010292 > RAX: 000000000000031f RBX: ffff888079b24980 RCX: 0000000000000000 > RDX: 000000000000031f RSI: ffffffff81e5b2c9 RDI: fffff52001d31f1c > RBP: 0000000000000000 R08: 0000000000000005 R09: 0000000000000000 > R10: 0000000080000000 R11: 0000000000000001 R12: 0000000008100177 > R13: ffff88804adf9510 R14: 0000000000000000 R15: 0000000000000000 > FS: 00007f06093436c0(0000) GS:ffff8881245b1000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00007fff341d3f52 CR3: 00000000319b0000 CR4: 00000000003526f0 > Call Trace: > <TASK> > khugepaged_enter_vma mm/khugepaged.c:467 [inline] > khugepaged_enter_vma+0x137/0x2c0 mm/khugepaged.c:461 > do_huge_pmd_anonymous_page+0x1c8/0x1c00 mm/huge_memory.c:1469 > create_huge_pmd mm/memory.c:6102 [inline] > __handle_mm_fault+0x1e96/0x2b50 mm/memory.c:6376 > handle_mm_fault+0x36d/0xa20 mm/memory.c:6583 > do_user_addr_fault+0x5a3/0x12f0 arch/x86/mm/fault.c:1334 > handle_page_fault arch/x86/mm/fault.c:1474 [inline] > exc_page_fault+0x6f/0xd0 arch/x86/mm/fault.c:1527 > asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:618 This is the VM_BUG_ON_MM(hpage_collapse_test_exit(mm), mm), which checks atomic_read(&mm->mm_users) == 0; So we have mm->mm_users == 0 while processing a page fault. Weird. -- Cheers, David ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [syzbot] [mm?] kernel BUG in __khugepaged_enter 2026-02-16 14:40 ` David Hildenbrand (Arm) @ 2026-02-16 14:43 ` Lorenzo Stoakes 2026-02-16 16:08 ` Lorenzo Stoakes 0 siblings, 1 reply; 5+ messages in thread From: Lorenzo Stoakes @ 2026-02-16 14:43 UTC (permalink / raw) To: David Hildenbrand (Arm) Cc: syzbot, Liam.Howlett, akpm, baohua, baolin.wang, dev.jain, lance.yang, linux-kernel, linux-mm, npache, ryan.roberts, syzkaller-bugs, ziy On Mon, Feb 16, 2026 at 03:40:21PM +0100, David Hildenbrand (Arm) wrote: > On 2/14/26 17:40, syzbot wrote: > > Hello, > > > > syzbot found the following issue on: > > > > HEAD commit: 1e83ccd5921a sched/mmcid: Don't assume CID is CPU owned on.. > > git tree: upstream > > console output: https://syzkaller.appspot.com/x/log.txt?x=1169dae6580000 > > kernel config: https://syzkaller.appspot.com/x/.config?x=54ae71b284dd0e13 > > dashboard link: https://syzkaller.appspot.com/bug?extid=6b554d491efbe066b701 > > compiler: gcc (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44 > > > > Unfortunately, I don't have any reproducer for this issue yet. > > > > Downloadable assets: > > disk image: https://storage.googleapis.com/syzbot-assets/ed43f42e3ea1/disk-1e83ccd5.raw.xz > > vmlinux: https://storage.googleapis.com/syzbot-assets/d8af54a32588/vmlinux-1e83ccd5.xz > > kernel image: https://storage.googleapis.com/syzbot-assets/34e6a8cc1037/bzImage-1e83ccd5.xz > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > > Reported-by: syzbot+6b554d491efbe066b701@syzkaller.appspotmail.com > > > > ------------[ cut here ]------------ > > kernel BUG at mm/khugepaged.c:438! > > Oops: invalid opcode: 0000 [#1] SMP KASAN PTI > > CPU: 0 UID: 0 PID: 16472 Comm: syz.3.2372 Tainted: G U W L XTNJ syzkaller #0 PREEMPT(full) > > Tainted: [U]=USER, [W]=WARN, [L]=SOFTLOCKUP, [X]=AUX, [T]=RANDSTRUCT, [N]=TEST, [J]=FWCTL > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/24/2026 > > RIP: 0010:__khugepaged_enter+0x30a/0x380 mm/khugepaged.c:438 > > Code: 64 7e 8e e8 a8 dc 66 ff e8 93 e6 8d ff 5b 5d 41 5c 41 5d 41 5e 41 5f e9 04 6c 04 09 e8 7f e6 8d ff 48 89 df e8 17 33 d9 ff 90 <0f> 0b 48 89 ef e8 dc 51 f8 ff e9 3b fd ff ff e8 f2 52 f8 ff e9 e1 > > RSP: 0018:ffffc9000e98fba8 EFLAGS: 00010292 > > RAX: 000000000000031f RBX: ffff888079b24980 RCX: 0000000000000000 > > RDX: 000000000000031f RSI: ffffffff81e5b2c9 RDI: fffff52001d31f1c > > RBP: 0000000000000000 R08: 0000000000000005 R09: 0000000000000000 > > R10: 0000000080000000 R11: 0000000000000001 R12: 0000000008100177 > > R13: ffff88804adf9510 R14: 0000000000000000 R15: 0000000000000000 > > FS: 00007f06093436c0(0000) GS:ffff8881245b1000(0000) knlGS:0000000000000000 > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > CR2: 00007fff341d3f52 CR3: 00000000319b0000 CR4: 00000000003526f0 > > Call Trace: > > <TASK> > > khugepaged_enter_vma mm/khugepaged.c:467 [inline] > > khugepaged_enter_vma+0x137/0x2c0 mm/khugepaged.c:461 > > do_huge_pmd_anonymous_page+0x1c8/0x1c00 mm/huge_memory.c:1469 > > create_huge_pmd mm/memory.c:6102 [inline] > > __handle_mm_fault+0x1e96/0x2b50 mm/memory.c:6376 > > handle_mm_fault+0x36d/0xa20 mm/memory.c:6583 > > do_user_addr_fault+0x5a3/0x12f0 arch/x86/mm/fault.c:1334 > > handle_page_fault arch/x86/mm/fault.c:1474 [inline] > > exc_page_fault+0x6f/0xd0 arch/x86/mm/fault.c:1527 > > asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:618 > > This is the VM_BUG_ON_MM(hpage_collapse_test_exit(mm), mm), which checks > > atomic_read(&mm->mm_users) == 0; > > So we have mm->mm_users == 0 while processing a page fault. Weird. mm lifecycle... pinging Liam :) > > -- > Cheers, > > David Cheers, Lorenzo ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [syzbot] [mm?] kernel BUG in __khugepaged_enter 2026-02-16 14:43 ` Lorenzo Stoakes @ 2026-02-16 16:08 ` Lorenzo Stoakes 2026-02-17 17:30 ` Liam R. Howlett 0 siblings, 1 reply; 5+ messages in thread From: Lorenzo Stoakes @ 2026-02-16 16:08 UTC (permalink / raw) To: David Hildenbrand (Arm) Cc: syzbot, Liam.Howlett, akpm, baohua, baolin.wang, dev.jain, lance.yang, linux-kernel, linux-mm, npache, ryan.roberts, syzkaller-bugs, ziy, Thomas Gleixner +cc Thomas in case the commit it's sat at is indicative, there does seem to be some weirdness with MMF_MULTIPROCESS processes (i.e. CLONE_VM but !CLONE_THREAD) resulting in possible memory corruption? We kinda need a repro to be sure though I think... On Mon, Feb 16, 2026 at 02:43:17PM +0000, Lorenzo Stoakes wrote: > On Mon, Feb 16, 2026 at 03:40:21PM +0100, David Hildenbrand (Arm) wrote: > > On 2/14/26 17:40, syzbot wrote: > > > Hello, > > > > > > syzbot found the following issue on: > > > > > > HEAD commit: 1e83ccd5921a sched/mmcid: Don't assume CID is CPU owned on.. > > > git tree: upstream > > > console output: https://syzkaller.appspot.com/x/log.txt?x=1169dae6580000 > > > kernel config: https://syzkaller.appspot.com/x/.config?x=54ae71b284dd0e13 > > > dashboard link: https://syzkaller.appspot.com/bug?extid=6b554d491efbe066b701 > > > compiler: gcc (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44 > > > > > > Unfortunately, I don't have any reproducer for this issue yet. We're going to need one I fear :) > > > > > > Downloadable assets: > > > disk image: https://storage.googleapis.com/syzbot-assets/ed43f42e3ea1/disk-1e83ccd5.raw.xz > > > vmlinux: https://storage.googleapis.com/syzbot-assets/d8af54a32588/vmlinux-1e83ccd5.xz > > > kernel image: https://storage.googleapis.com/syzbot-assets/34e6a8cc1037/bzImage-1e83ccd5.xz > > > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > > > Reported-by: syzbot+6b554d491efbe066b701@syzkaller.appspotmail.com > > > > > > ------------[ cut here ]------------ > > > kernel BUG at mm/khugepaged.c:438! > > > Oops: invalid opcode: 0000 [#1] SMP KASAN PTI > > > CPU: 0 UID: 0 PID: 16472 Comm: syz.3.2372 Tainted: G U W L XTNJ syzkaller #0 PREEMPT(full) > > > Tainted: [U]=USER, [W]=WARN, [L]=SOFTLOCKUP, [X]=AUX, [T]=RANDSTRUCT, [N]=TEST, [J]=FWCTL > > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/24/2026 > > > RIP: 0010:__khugepaged_enter+0x30a/0x380 mm/khugepaged.c:438 > > > Code: 64 7e 8e e8 a8 dc 66 ff e8 93 e6 8d ff 5b 5d 41 5c 41 5d 41 5e 41 5f e9 04 6c 04 09 e8 7f e6 8d ff 48 89 df e8 17 33 d9 ff 90 <0f> 0b 48 89 ef e8 dc 51 f8 ff e9 3b fd ff ff e8 f2 52 f8 ff e9 e1 > > > RSP: 0018:ffffc9000e98fba8 EFLAGS: 00010292 > > > RAX: 000000000000031f RBX: ffff888079b24980 RCX: 0000000000000000 > > > RDX: 000000000000031f RSI: ffffffff81e5b2c9 RDI: fffff52001d31f1c > > > RBP: 0000000000000000 R08: 0000000000000005 R09: 0000000000000000 > > > R10: 0000000080000000 R11: 0000000000000001 R12: 0000000008100177 > > > R13: ffff88804adf9510 R14: 0000000000000000 R15: 0000000000000000 > > > FS: 00007f06093436c0(0000) GS:ffff8881245b1000(0000) knlGS:0000000000000000 > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > CR2: 00007fff341d3f52 CR3: 00000000319b0000 CR4: 00000000003526f0 > > > Call Trace: > > > <TASK> > > > khugepaged_enter_vma mm/khugepaged.c:467 [inline] > > > khugepaged_enter_vma+0x137/0x2c0 mm/khugepaged.c:461 > > > do_huge_pmd_anonymous_page+0x1c8/0x1c00 mm/huge_memory.c:1469 > > > create_huge_pmd mm/memory.c:6102 [inline] > > > __handle_mm_fault+0x1e96/0x2b50 mm/memory.c:6376 > > > handle_mm_fault+0x36d/0xa20 mm/memory.c:6583 > > > do_user_addr_fault+0x5a3/0x12f0 arch/x86/mm/fault.c:1334 vma = lock_vma_under_rcu(mm, address); if (!vma) goto lock_mmap; <--- didn't jump there, so is a VMA lock. if (unlikely(access_error(error_code, vma))) { bad_area_access_error(regs, error_code, address, NULL, vma); count_vm_vma_lock_event(VMA_LOCK_SUCCESS); return; } fault = handle_mm_fault(vma, address, flags | FAULT_FLAG_VMA_LOCK, regs); <-- here > > > handle_page_fault arch/x86/mm/fault.c:1474 [inline] > > > exc_page_fault+0x6f/0xd0 arch/x86/mm/fault.c:1527 > > > asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:618 > > > > This is the VM_BUG_ON_MM(hpage_collapse_test_exit(mm), mm), which checks > > > > atomic_read(&mm->mm_users) == 0; Yeah, and that just shouldn't be possible, so maybe memory corruption? The crash log indicates the system is tainted by softlock https://syzkaller.appspot.com/text?tag=CrashLog&x=1169dae6580000 so something's gone horribly wrong there... (from crash log) [ 696.104336][T16472] pgd ffff8880319b0000 mm_users 0 mm_count 2 pgtables_bytes 155648 map_count 32 VMA's still there so exit_mmap() hasn't run yet... But hmm we injected a fault :) [ 696.293779][T16475] FAULT_INJECTION: forcing a failure. [ 696.293779][T16475] name fail_page_alloc, interval 1, probability 0, space 0, times 0 [ 696.332139][T16475] dump_stack_lvl+0x100/0x190 [ 696.332164][T16475] should_fail_ex.cold+0x5/0xa [ 696.332178][T16475] ? prepare_alloc_pages+0x16d/0x5f0 [ 696.332200][T16475] should_fail_alloc_page+0xeb/0x140 [ 696.332219][T16475] prepare_alloc_pages+0x1f0/0x5f0 [ 696.332241][T16475] __alloc_frozen_pages_noprof+0x193/0x2410 [ 696.332258][T16475] ? stack_trace_save+0x8e/0xc0 [ 696.332277][T16475] ? __pfx_stack_trace_save+0x10/0x10 [ 696.332297][T16475] ? stack_depot_save_flags+0x27/0x9d0 [ 696.332315][T16475] ? __lock_acquire+0x4a5/0x2630 [ 696.332331][T16475] ? kasan_save_stack+0x3f/0x50 [ 696.332346][T16475] ? __pfx___alloc_frozen_pages_noprof+0x10/0x10 [ 696.332360][T16475] ? copy_time_ns+0xf6/0x800 [ 696.332379][T16475] ? unshare_nsproxy_namespaces+0xc3/0x1f0 [ 696.332408][T16475] ? __x64_sys_unshare+0x31/0x40 [ 696.332423][T16475] ? do_syscall_64+0x106/0xf80 [ 696.332437][T16475] ? entry_SYSCALL_64_after_hwframe+0x77/0x7f [ 696.332460][T16475] ? __sanitizer_cov_trace_switch+0x54/0x90 [ 696.332480][T16475] ? policy_nodemask+0xed/0x4f0 [ 696.332500][T16475] alloc_pages_mpol+0x1fb/0x550 [ 696.332519][T16475] ? __pfx_alloc_pages_mpol+0x10/0x10 [ 696.332542][T16475] alloc_pages_noprof+0x131/0x390 [ 696.332560][T16475] copy_time_ns+0x11a/0x800 So: static struct nsproxy *create_new_namespaces(u64 flags, struct task_struct *tsk, struct user_namespace *user_ns, struct fs_struct *new_fs) { ... new_nsp->time_ns_for_children = copy_time_ns(flags, user_ns, tsk->nsproxy->time_ns_for_children); <- -ENOMEM if (IS_ERR(new_nsp->time_ns_for_children)) { err = PTR_ERR(new_nsp->time_ns_for_children); goto out_time; } ... out_time: put_net(new_nsp->net_ns); out_net: put_cgroup_ns(new_nsp->cgroup_ns); out_cgroup: put_pid_ns(new_nsp->pid_ns_for_children); out_pid: put_ipc_ns(new_nsp->ipc_ns); out_ipc: put_uts_ns(new_nsp->uts_ns); out_uts: put_mnt_ns(new_nsp->mnt_ns); out_ns: kmem_cache_free(nsproxy_cachep, new_nsp); return ERR_PTR(err); } So we're putting the world... maybe some of this is buggy? [ 696.332578][T16475] ? copy_cgroup_ns+0x71/0x970 [ 696.332601][T16475] create_new_namespaces+0x48a/0xac0 So: int unshare_nsproxy_namespaces(unsigned long unshare_flags, struct nsproxy **new_nsp, struct cred *new_cred, struct fs_struct *new_fs) { ... *new_nsp = create_new_namespaces(unshare_flags, current, user_ new_fs ? new_fs : current->fs); if (IS_ERR(*new_nsp)) { err = PTR_ERR(*new_nsp); goto out; } ... out: return err; } [ 696.332626][T16475] unshare_nsproxy_namespaces+0xc3/0x1f0 [ 696.332648][T16475] ksys_unshare+0x455/0xab0 So: int ksys_unshare(unsigned long unshare_flags) { ... err = unshare_nsproxy_namespaces(unshare_flags, &new_nsproxy, new_cred, new_fs); if (err) goto bad_unshare_cleanup_cred; ... bad_unshare_cleanup_cred: if (new_cred) put_cred(new_cred); bad_unshare_cleanup_fd: if (new_fd) put_files_struct(new_fd); bad_unshare_cleanup_fs: if (new_fs) free_fs_struct(new_fs); bad_unshare_out: return err; } And again we're putting all the things... maybe something buggy here? Perhaps this unshare is racing with something else? OTOH, we _already_ had mm_users = 0 at this point (as per mm dump) so. Probably something before got us into this state? [ 696.332664][T16475] ? __pfx_ksys_unshare+0x10/0x10 [ 696.332679][T16475] ? xfd_validate_state+0x129/0x190 [ 696.332702][T16475] __x64_sys_unshare+0x31/0x40 [ 696.332717][T16475] do_syscall_64+0x106/0xf80 [ 696.332730][T16475] ? clear_bhb_loop+0x40/0x90 [ 696.332747][T16475] entry_SYSCALL_64_after_hwframe+0x77/0x7f Also from mm dump: flags: 00000000,840007fd MMF_TOPDOWN | MMF_MULTIPROCESS | (core dump flags) No MMF_VM_HUGEPAGE... MMF_MULTIPROCESS marks this as shared between processes, as set in copy_process() -> copy_oom_score_adj() which has a guard: /* Skip if spawning a thread or using vfork */ if ((clone_flags & (CLONE_VM | CLONE_THREAD | CLONE_VFORK)) != CLONE_VM) return; Which grabs the mm in __set_oom_adj() which as per commit 44a70adec910 ("mm, oom_adj: make sure processes sharing mm have same view of oom_score_adj") suggests processes were cloned with CLONE_VM but not CLONE_SIGHAND (which presumably implies !CLONE_THREAD). Anyway it's hard to know with a repro. Cheers, Lorenzo ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [syzbot] [mm?] kernel BUG in __khugepaged_enter 2026-02-16 16:08 ` Lorenzo Stoakes @ 2026-02-17 17:30 ` Liam R. Howlett 0 siblings, 0 replies; 5+ messages in thread From: Liam R. Howlett @ 2026-02-17 17:30 UTC (permalink / raw) To: Lorenzo Stoakes Cc: David Hildenbrand (Arm), syzbot, akpm, baohua, baolin.wang, dev.jain, lance.yang, linux-kernel, linux-mm, npache, ryan.roberts, syzkaller-bugs, ziy, Thomas Gleixner * Lorenzo Stoakes <lorenzo.stoakes@oracle.com> [260216 11:08]: > +cc Thomas in case the commit it's sat at is indicative, there does seem to be > some weirdness with MMF_MULTIPROCESS processes (i.e. CLONE_VM but !CLONE_THREAD) > resulting in possible memory corruption? > > We kinda need a repro to be sure though I think... > > On Mon, Feb 16, 2026 at 02:43:17PM +0000, Lorenzo Stoakes wrote: > > On Mon, Feb 16, 2026 at 03:40:21PM +0100, David Hildenbrand (Arm) wrote: > > > On 2/14/26 17:40, syzbot wrote: > > > > Hello, > > > > > > > > syzbot found the following issue on: > > > > > > > > HEAD commit: 1e83ccd5921a sched/mmcid: Don't assume CID is CPU owned on.. > > > > git tree: upstream > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=1169dae6580000 > > > > kernel config: https://syzkaller.appspot.com/x/.config?x=54ae71b284dd0e13 > > > > dashboard link: https://syzkaller.appspot.com/bug?extid=6b554d491efbe066b701 > > > > compiler: gcc (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44 > > > > > > > > Unfortunately, I don't have any reproducer for this issue yet. > > We're going to need one I fear :) Indeed. No clear smoking gun. > > > > > > > > > Downloadable assets: > > > > disk image: https://storage.googleapis.com/syzbot-assets/ed43f42e3ea1/disk-1e83ccd5.raw.xz > > > > vmlinux: https://storage.googleapis.com/syzbot-assets/d8af54a32588/vmlinux-1e83ccd5.xz > > > > kernel image: https://storage.googleapis.com/syzbot-assets/34e6a8cc1037/bzImage-1e83ccd5.xz > > > > > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > > > > Reported-by: syzbot+6b554d491efbe066b701@syzkaller.appspotmail.com > > > > > > > > ------------[ cut here ]------------ > > > > kernel BUG at mm/khugepaged.c:438! > > > > Oops: invalid opcode: 0000 [#1] SMP KASAN PTI > > > > CPU: 0 UID: 0 PID: 16472 Comm: syz.3.2372 Tainted: G U W L XTNJ syzkaller #0 PREEMPT(full) > > > > Tainted: [U]=USER, [W]=WARN, [L]=SOFTLOCKUP, [X]=AUX, [T]=RANDSTRUCT, [N]=TEST, [J]=FWCTL > > > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/24/2026 > > > > RIP: 0010:__khugepaged_enter+0x30a/0x380 mm/khugepaged.c:438 > > > > Code: 64 7e 8e e8 a8 dc 66 ff e8 93 e6 8d ff 5b 5d 41 5c 41 5d 41 5e 41 5f e9 04 6c 04 09 e8 7f e6 8d ff 48 89 df e8 17 33 d9 ff 90 <0f> 0b 48 89 ef e8 dc 51 f8 ff e9 3b fd ff ff e8 f2 52 f8 ff e9 e1 > > > > RSP: 0018:ffffc9000e98fba8 EFLAGS: 00010292 > > > > RAX: 000000000000031f RBX: ffff888079b24980 RCX: 0000000000000000 > > > > RDX: 000000000000031f RSI: ffffffff81e5b2c9 RDI: fffff52001d31f1c > > > > RBP: 0000000000000000 R08: 0000000000000005 R09: 0000000000000000 > > > > R10: 0000000080000000 R11: 0000000000000001 R12: 0000000008100177 > > > > R13: ffff88804adf9510 R14: 0000000000000000 R15: 0000000000000000 > > > > FS: 00007f06093436c0(0000) GS:ffff8881245b1000(0000) knlGS:0000000000000000 > > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > > CR2: 00007fff341d3f52 CR3: 00000000319b0000 CR4: 00000000003526f0 > > > > Call Trace: > > > > <TASK> > > > > khugepaged_enter_vma mm/khugepaged.c:467 [inline] > > > > khugepaged_enter_vma+0x137/0x2c0 mm/khugepaged.c:461 > > > > do_huge_pmd_anonymous_page+0x1c8/0x1c00 mm/huge_memory.c:1469 > > > > create_huge_pmd mm/memory.c:6102 [inline] > > > > __handle_mm_fault+0x1e96/0x2b50 mm/memory.c:6376 > > > > handle_mm_fault+0x36d/0xa20 mm/memory.c:6583 > > > > do_user_addr_fault+0x5a3/0x12f0 arch/x86/mm/fault.c:1334 > > vma = lock_vma_under_rcu(mm, address); > if (!vma) > goto lock_mmap; <--- didn't jump there, so is a VMA lock. > > if (unlikely(access_error(error_code, vma))) { > bad_area_access_error(regs, error_code, address, NULL, vma); > count_vm_vma_lock_event(VMA_LOCK_SUCCESS); > return; > } > fault = handle_mm_fault(vma, address, flags | FAULT_FLAG_VMA_LOCK, regs); <-- here > > > > > handle_page_fault arch/x86/mm/fault.c:1474 [inline] > > > > exc_page_fault+0x6f/0xd0 arch/x86/mm/fault.c:1527 > > > > asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:618 > > > > > > This is the VM_BUG_ON_MM(hpage_collapse_test_exit(mm), mm), which checks > > > > > > atomic_read(&mm->mm_users) == 0; > > Yeah, and that just shouldn't be possible, so maybe memory corruption? tglx had some changes to the mm_users account for cid stuff [1]. There is also a potential double mmput() fix [2]. > > The crash log indicates the system is tainted by softlock > https://syzkaller.appspot.com/text?tag=CrashLog&x=1169dae6580000 so something's > gone horribly wrong there... > > (from crash log) > [ 696.104336][T16472] pgd ffff8880319b0000 mm_users 0 mm_count 2 pgtables_bytes 155648 map_count 32 > > VMA's still there so exit_mmap() hasn't run yet... > > But hmm we injected a fault :) > > [ 696.293779][T16475] FAULT_INJECTION: forcing a failure. > [ 696.293779][T16475] name fail_page_alloc, interval 1, probability 0, space 0, times 0 > > [ 696.332139][T16475] dump_stack_lvl+0x100/0x190 > [ 696.332164][T16475] should_fail_ex.cold+0x5/0xa > [ 696.332178][T16475] ? prepare_alloc_pages+0x16d/0x5f0 > [ 696.332200][T16475] should_fail_alloc_page+0xeb/0x140 > [ 696.332219][T16475] prepare_alloc_pages+0x1f0/0x5f0 > [ 696.332241][T16475] __alloc_frozen_pages_noprof+0x193/0x2410 > [ 696.332258][T16475] ? stack_trace_save+0x8e/0xc0 > [ 696.332277][T16475] ? __pfx_stack_trace_save+0x10/0x10 > [ 696.332297][T16475] ? stack_depot_save_flags+0x27/0x9d0 > [ 696.332315][T16475] ? __lock_acquire+0x4a5/0x2630 > [ 696.332331][T16475] ? kasan_save_stack+0x3f/0x50 > [ 696.332346][T16475] ? __pfx___alloc_frozen_pages_noprof+0x10/0x10 > [ 696.332360][T16475] ? copy_time_ns+0xf6/0x800 > [ 696.332379][T16475] ? unshare_nsproxy_namespaces+0xc3/0x1f0 > [ 696.332408][T16475] ? __x64_sys_unshare+0x31/0x40 > [ 696.332423][T16475] ? do_syscall_64+0x106/0xf80 > [ 696.332437][T16475] ? entry_SYSCALL_64_after_hwframe+0x77/0x7f > [ 696.332460][T16475] ? __sanitizer_cov_trace_switch+0x54/0x90 > [ 696.332480][T16475] ? policy_nodemask+0xed/0x4f0 > [ 696.332500][T16475] alloc_pages_mpol+0x1fb/0x550 > [ 696.332519][T16475] ? __pfx_alloc_pages_mpol+0x10/0x10 > [ 696.332542][T16475] alloc_pages_noprof+0x131/0x390 > [ 696.332560][T16475] copy_time_ns+0x11a/0x800 There are a number of failures reported, all seem to be the same injections. ... > Perhaps this unshare is racing with something else? Yeah, I cannot find a particular obvious bug either. > > OTOH, we _already_ had mm_users = 0 at this point (as per mm dump) so. Probably > something before got us into this state? This may be a previous failure that was missed or expected? Is the syzbot a clean run or has a full terminal output? The outputs seem to indicate it's the 'last test run' but not the entire log..? Could we have already failed via the previous fix [2]? It's not in the tree where this bug is being reported. > > [ 696.332664][T16475] ? __pfx_ksys_unshare+0x10/0x10 > [ 696.332679][T16475] ? xfd_validate_state+0x129/0x190 > [ 696.332702][T16475] __x64_sys_unshare+0x31/0x40 > [ 696.332717][T16475] do_syscall_64+0x106/0xf80 > [ 696.332730][T16475] ? clear_bhb_loop+0x40/0x90 > [ 696.332747][T16475] entry_SYSCALL_64_after_hwframe+0x77/0x7f > > Also from mm dump: > > flags: 00000000,840007fd > > MMF_TOPDOWN | MMF_MULTIPROCESS | (core dump flags) > > No MMF_VM_HUGEPAGE... > > MMF_MULTIPROCESS marks this as shared between processes, as set in > copy_process() -> copy_oom_score_adj() which has a guard: > > /* Skip if spawning a thread or using vfork */ > if ((clone_flags & (CLONE_VM | CLONE_THREAD | CLONE_VFORK)) != CLONE_VM) > return; > > Which grabs the mm in __set_oom_adj() which as per commit 44a70adec910 ("mm, > oom_adj: make sure processes sharing mm have same view of oom_score_adj") > suggests processes were cloned with CLONE_VM but not CLONE_SIGHAND (which > presumably implies !CLONE_THREAD). > > Anyway it's hard to know with a repro. > Thanks, Liam [1] https://lore.kernel.org/all/20251119172549.832764634@linutronix.de/ [2] https://lore.kernel.org/all/20260210192738.3041609-1-andrii@kernel.org/ ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-02-17 17:31 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2026-02-14 16:40 [syzbot] [mm?] kernel BUG in __khugepaged_enter syzbot 2026-02-16 14:40 ` David Hildenbrand (Arm) 2026-02-16 14:43 ` Lorenzo Stoakes 2026-02-16 16:08 ` Lorenzo Stoakes 2026-02-17 17:30 ` Liam R. Howlett
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox