[syzbot] [mm?] INFO: rcu detected stall in sys

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [syzbot] [mm?] INFO: rcu detected stall in sys_munmap (2)
@ 2025-08-22  4:15 syzbot
  2025-08-22 12:08 ` Lorenzo Stoakes
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: syzbot @ 2025-08-22  4:15 UTC (permalink / raw)
  To: Liam.Howlett, akpm, jannh, linux-kernel, linux-mm,
	lorenzo.stoakes, pfalcato, syzkaller-bugs, vbabka

Hello,

syzbot found the following issue on:

HEAD commit:    be48bcf004f9 Merge tag 'for-6.17-rc2-tag' of git://git.ker..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=136dfba2580000
kernel config:  https://syzkaller.appspot.com/x/.config?x=142508fb116c212f
dashboard link: https://syzkaller.appspot.com/bug?extid=8785aaf121cfb2141e0d
compiler:       gcc (Debian 12.2.0-14+deb12u1) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=107a43bc580000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/24fd400c6842/disk-be48bcf0.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/59146305635d/vmlinux-be48bcf0.xz
kernel image: https://storage.googleapis.com/syzbot-assets/b3e5f65cbcc8/bzImage-be48bcf0.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+8785aaf121cfb2141e0d@syzkaller.appspotmail.com

rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
rcu: 	Tasks blocked on level-0 rcu_node (CPUs 0-1): P6029/1:b..l P1208/1:b..l P6031/3:b..l P6030/1:b..l
rcu: 	(detected by 1, t=10502 jiffies, g=6285, q=421 ncpus=2)
task:dhcpcd          state:R  running task     stack:28896 pid:6030  tgid:6030  ppid:5513   task_flags:0x400040 flags:0x00004002
Call Trace:
 <TASK>
 context_switch kernel/sched/core.c:5357 [inline]
 __schedule+0x1190/0x5de0 kernel/sched/core.c:6961
 preempt_schedule_irq+0x51/0x90 kernel/sched/core.c:7288
 irqentry_exit+0x36/0x90 kernel/entry/common.c:197
 asm_sysvec_reschedule_ipi+0x1a/0x20 arch/x86/include/asm/idtentry.h:707
RIP: 0010:unwind_next_frame+0xfe7/0x20a0 arch/x86/kernel/unwind_orc.c:664
Code: 85 80 0c 00 00 49 89 6d 40 48 b8 00 00 00 00 00 fc ff df 4c 89 ea 48 c1 ea 03 0f b6 04 02 84 c0 74 08 3c 03 0f 8e 56 10 00 00 <41> 39 5d 00 0f 84 10 06 00 00 bd 01 00 00 00 e9 de f3 ff ff 48 b8
RSP: 0018:ffffc90003cdf6a8 EFLAGS: 00000246
RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffc90003ce0000
RDX: 1ffff9200079bee3 RSI: ffffc90003cdfa70 RDI: ffffc90003cdf758
RBP: ffffc90003cdfae0 R08: 0000000000000001 R09: 0000000000000000
R10: ffffc90003cdf718 R11: 00000000000121e6 R12: ffffc90003cdf768
R13: ffffc90003cdf718 R14: ffffc90003cdfa80 R15: ffffc90003cdf74c
 arch_stack_walk+0x94/0x100 arch/x86/kernel/stacktrace.c:25
 stack_trace_save+0x8e/0xc0 kernel/stacktrace.c:122
 kasan_save_stack+0x33/0x60 mm/kasan/common.c:47
 kasan_record_aux_stack+0xa7/0xc0 mm/kasan/generic.c:548
 slab_free_hook mm/slub.c:2378 [inline]
 slab_free mm/slub.c:4680 [inline]
 kmem_cache_free+0x15a/0x4d0 mm/slub.c:4782
 vms_complete_munmap_vmas+0x573/0x970 mm/vma.c:1293
 do_vmi_align_munmap+0x43b/0x7d0 mm/vma.c:1536
 do_vmi_munmap+0x204/0x3e0 mm/vma.c:1584
 __vm_munmap+0x19a/0x390 mm/vma.c:3155
 __do_sys_munmap mm/mmap.c:1080 [inline]
 __se_sys_munmap mm/mmap.c:1077 [inline]
 __x64_sys_munmap+0x59/0x80 mm/mmap.c:1077
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xcd/0x4c0 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fb13ec2f2e7
RSP: 002b:00007fffe10fae78 EFLAGS: 00000206 ORIG_RAX: 000000000000000b
RAX: ffffffffffffffda RBX: 0000562bd1443510 RCX: 00007fb13ec2f2e7
RDX: 0000000000000002 RSI: 0000000000004028 RDI: 00007fb13ea1b000
RBP: 00007fffe10faf80 R08: 0000562bd1432470 R09: 0000000000000001
R10: 00007fffe10fadb0 R11: 0000000000000206 R12: 00007fffe10faea0
R13: 00007fb13ec42000 R14: 0000562bd1443510 R15: 0000000000000000
 </TASK>
task:syz-executor    state:R  running task     stack:27632 pid:6031  tgid:6031  ppid:5870   task_flags:0x400000 flags:0x00004000
Call Trace:
 <TASK>
 context_switch kernel/sched/core.c:5357 [inline]
 __schedule+0x1190/0x5de0 kernel/sched/core.c:6961
 preempt_schedule_common+0x44/0xc0 kernel/sched/core.c:7145
 preempt_schedule_thunk+0x16/0x30 arch/x86/entry/thunk.S:12
 __raw_spin_unlock include/linux/spinlock_api_smp.h:143 [inline]
 _raw_spin_unlock+0x3e/0x50 kernel/locking/spinlock.c:186
 spin_unlock include/linux/spinlock.h:391 [inline]
 filemap_map_pages+0xe15/0x1670 mm/filemap.c:3791
 do_fault_around mm/memory.c:5531 [inline]
 do_read_fault mm/memory.c:5564 [inline]
 do_fault mm/memory.c:5707 [inline]
 do_pte_missing+0xe39/0x3ba0 mm/memory.c:4234
 handle_pte_fault mm/memory.c:6052 [inline]
 __handle_mm_fault+0x152a/0x2a50 mm/memory.c:6195
 handle_mm_fault+0x589/0xd10 mm/memory.c:6364
 do_user_addr_fault+0x60c/0x1370 arch/x86/mm/fault.c:1336
 handle_page_fault arch/x86/mm/fault.c:1476 [inline]
 exc_page_fault+0x5c/0xb0 arch/x86/mm/fault.c:1532
 asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:623
RIP: 0033:0x7f54cd7177c7
RSP: 002b:00007fffb79a5b40 EFLAGS: 00010246
RAX: 00007f54ce525000 RBX: 0000000000000000 RCX: 0000000000000064
RDX: 00007fffb79a5de9 RSI: 0000000000000002 RDI: 00007fffb79a5dd8
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000002
R13: 00007fffb79a5c48 R14: 0000000000000000 R15: 0000000000000000
 </TASK>
task:kworker/0:3     state:R  running task     stack:25368 pid:1208  tgid:1208  ppid:2      task_flags:0x4208060 flags:0x00004000
Workqueue: events_power_efficient gc_worker
Call Trace:
 <TASK>
 context_switch kernel/sched/core.c:5357 [inline]
 __schedule+0x1190/0x5de0 kernel/sched/core.c:6961
 preempt_schedule_irq+0x51/0x90 kernel/sched/core.c:7288
 irqentry_exit+0x36/0x90 kernel/entry/common.c:197
 asm_sysvec_reschedule_ipi+0x1a/0x20 arch/x86/include/asm/idtentry.h:707
RIP: 0010:write_comp_data+0x0/0x90 kernel/kcov.c:240
Code: 48 8b 05 db b4 1a 12 48 8b 80 30 16 00 00 e9 97 05 db 09 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 <49> 89 d2 49 89 f8 49 89 f1 65 48 8b 15 a7 b4 1a 12 65 8b 05 b8 b4
RSP: 0018:ffffc9000441fb50 EFLAGS: 00000293
RAX: 0000000000000000 RBX: 0000000000040000 RCX: ffffffff89ba2a52
RDX: 0000000000040000 RSI: 0000000000000433 RDI: 0000000000000004
RBP: ffffffff9b2c41ec R08: 0000000000000004 R09: 0000000000000000
R10: 0000000000000000 R11: ffffffff9b030610 R12: ffff888031800000
R13: 0000000000000433 R14: dffffc0000000000 R15: 0000000000001770
 gc_worker+0x342/0x16e0 net/netfilter/nf_conntrack_core.c:1549
 process_one_work+0x9cc/0x1b70 kernel/workqueue.c:3236
 process_scheduled_works kernel/workqueue.c:3319 [inline]
 worker_thread+0x6c8/0xf10 kernel/workqueue.c:3400
 kthread+0x3c5/0x780 kernel/kthread.c:463
 ret_from_fork+0x5d7/0x6f0 arch/x86/kernel/process.c:148
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
 </TASK>
task:dhcpcd          state:R  running task     stack:26072 pid:6029  tgid:6029  ppid:5513   task_flags:0x400040 flags:0x00004002
Call Trace:
 <TASK>
 context_switch kernel/sched/core.c:5357 [inline]
 __schedule+0x1190/0x5de0 kernel/sched/core.c:6961
 preempt_schedule_irq+0x51/0x90 kernel/sched/core.c:7288
 irqentry_exit+0x36/0x90 kernel/entry/common.c:197
 asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:702
RIP: 0010:orc_ip arch/x86/kernel/unwind_orc.c:80 [inline]
RIP: 0010:__orc_find+0x7e/0xf0 arch/x86/kernel/unwind_orc.c:102
Code: ea 3f 48 c1 fe 02 48 01 f2 48 d1 fa 48 8d 5c 95 00 48 89 da 48 c1 ea 03 0f b6 34 0a 48 89 da 83 e2 07 83 c2 03 40 38 f2 7c 05 <40> 84 f6 75 4b 48 63 13 48 01 da 49 39 d5 73 af 4c 8d 63 fc 49 39
RSP: 0018:ffffc90003337648 EFLAGS: 00000202
RAX: ffffffff914e0dd8 RBX: ffffffff90c5215c RCX: dffffc0000000000
RDX: 0000000000000007 RSI: 0000000000000000 RDI: ffffffff90c52148
RBP: ffffffff90c52148 R08: ffffffff914e0e1a R09: 0000000000000000
R10: ffffc900033376f8 R11: 0000000000011271 R12: ffffffff90c52170
R13: ffffffff82127173 R14: ffffffff90c52148 R15: ffffffff90c52148
 orc_find arch/x86/kernel/unwind_orc.c:227 [inline]
 unwind_next_frame+0x2ec/0x20a0 arch/x86/kernel/unwind_orc.c:494
 arch_stack_walk+0x94/0x100 arch/x86/kernel/stacktrace.c:25
 stack_trace_save+0x8e/0xc0 kernel/stacktrace.c:122
 kasan_save_stack+0x33/0x60 mm/kasan/common.c:47
 kasan_save_track+0x14/0x30 mm/kasan/common.c:68
 poison_kmalloc_redzone mm/kasan/common.c:388 [inline]
 __kasan_kmalloc+0xaa/0xb0 mm/kasan/common.c:405
 kmalloc_noprof include/linux/slab.h:905 [inline]
 slab_free_hook mm/slub.c:2369 [inline]
 slab_free mm/slub.c:4680 [inline]
 kmem_cache_free+0x142/0x4d0 mm/slub.c:4782
 vms_complete_munmap_vmas+0x573/0x970 mm/vma.c:1293
 do_vmi_align_munmap+0x43b/0x7d0 mm/vma.c:1536
 do_vmi_munmap+0x204/0x3e0 mm/vma.c:1584
 __vm_munmap+0x19a/0x390 mm/vma.c:3155
 __do_sys_munmap mm/mmap.c:1080 [inline]
 __se_sys_munmap mm/mmap.c:1077 [inline]
 __x64_sys_munmap+0x59/0x80 mm/mmap.c:1077
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xcd/0x4c0 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fb13ec2f2e7
RSP: 002b:00007fffe10fae78 EFLAGS: 00000202 ORIG_RAX: 000000000000000b
RAX: ffffffffffffffda RBX: 0000562bd1443f00 RCX: 00007fb13ec2f2e7
RDX: 0000000000000001 RSI: 000000000002f6d0 RDI: 00007fb13e9c1000
RBP: 00007fffe10faf80 R08: 00000000000004f0 R09: 0000000000000002
R10: 00007fffe10fadb0 R11: 0000000000000202 R12: 00007fffe10faec0
R13: 00007fb13ec42000 R14: 0000562bd1443f00 R15: 0000000000000000
 </TASK>


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [syzbot] [mm?] INFO: rcu detected stall in sys_munmap (2)
  2025-08-22  4:15 [syzbot] [mm?] INFO: rcu detected stall in sys_munmap (2) syzbot
@ 2025-08-22 12:08 ` Lorenzo Stoakes
  2025-08-22 13:55   ` Harry Yoo
  2025-08-28  2:05 ` Liam R. Howlett
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 11+ messages in thread
From: Lorenzo Stoakes @ 2025-08-22 12:08 UTC (permalink / raw)
  To: syzbot
  Cc: Liam.Howlett, akpm, jannh, linux-kernel, linux-mm, pfalcato,
	syzkaller-bugs, vbabka, Sebastian Andrzej Siewior, Harry Yoo

+cc Sebastian for RCU ORC change...

+cc Harry for slab side.

Pinging Jann for the CONFIG_SLUB_RCU_DEBUG element.

Jann - could this possibly be related to CONFIG_SLUB_RCU_DEBUG? As it seems to
the stack is within KASAN, but no KASAN report so maybe it's KASAN itself that's
having an issue?

Though I'm thinking maybe it's the orc unwinder itself that could be problematic
here (yet invoked by CONFIG_SLUB_RCU_DEBUG though)... and yeah kinda suspcious
because:

- We have two threads freeing VMAs using SLAB_TYPESAFE_BY_RCU
- CONFIG_SLUB_RCU_DEBUG means that we use KASAN to save an aux stack, which
  makes us do an unwind via ORC, which then takes an RCU read lock on
  unwind_next_frame(), and both are doing this unwinding at the time of report.
- ???
- Somehow things get locked up?

I'm not an RCU expert (clearly :) so I'm not sure exactly how this could result
in a stall, but it's suspicious.

On Thu, Aug 21, 2025 at 09:15:37PM -0700, syzbot wrote:
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit:    be48bcf004f9 Merge tag 'for-6.17-rc2-tag' of git://git.ker..
> git tree:       upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=136dfba2580000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=142508fb116c212f

lockdep (CONFIG_PROVE_LOCKING) is on, so I'm guessing there's no deadlock here.

CONFIG_DEBUG_VM_MAPLE_TREE is enabled, which will cause _major_ slowdown on VMA
operations as the tree is constantly being fully validated.

This may explain the stalls...

> dashboard link: https://syzkaller.appspot.com/bug?extid=8785aaf121cfb2141e0d
> compiler:       gcc (Debian 12.2.0-14+deb12u1) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=107a43bc580000

No C repro yet...

>
> Downloadable assets:
> disk image: https://storage.googleapis.com/syzbot-assets/24fd400c6842/disk-be48bcf0.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/59146305635d/vmlinux-be48bcf0.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/b3e5f65cbcc8/bzImage-be48bcf0.xz
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+8785aaf121cfb2141e0d@syzkaller.appspotmail.com
>
> rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> rcu: 	Tasks blocked on level-0 rcu_node (CPUs 0-1): P6029/1:b..l P1208/1:b..l P6031/3:b..l P6030/1:b..l
> rcu: 	(detected by 1, t=10502 jiffies, g=6285, q=421 ncpus=2)

So 105s, or 1m45s, that's pretty long...

> task:dhcpcd          state:R  running task     stack:28896 pid:6030  tgid:6030  ppid:5513   task_flags:0x400040 flags:0x00004002
> Call Trace:
>  <TASK>
>  context_switch kernel/sched/core.c:5357 [inline]
>  __schedule+0x1190/0x5de0 kernel/sched/core.c:6961
>  preempt_schedule_irq+0x51/0x90 kernel/sched/core.c:7288
>  irqentry_exit+0x36/0x90 kernel/entry/common.c:197
>  asm_sysvec_reschedule_ipi+0x1a/0x20 arch/x86/include/asm/idtentry.h:707
> RIP: 0010:unwind_next_frame+0xfe7/0x20a0 arch/x86/kernel/unwind_orc.c:664

Hmm, while the line number is not pertinent, I notice unwind_next_frame() has:

guard(rcu)()

In it from commit 14daa3bca217 ("x86: Use RCU in all users of
__module_address().") though from Jan 2025...

This is defined (took me a while to track down!!) in include/linux/rcupdate.h:

DEFINE_LOCK_GUARD_0(rcu,
	do {
		rcu_read_lock();
		/*
		 * sparse doesn't call the cleanup function,
		 * so just release immediately and don't track
		 * the context. We don't need to anyway, since
		 * the whole point of the guard is to not need
		 * the explicit unlock.
		 */
		__release(RCU);
	} while (0),
	rcu_read_unlock())

Meaning it's equivalent to a scoped rcu_read_lock() / rcu_read_unlock().

But since no C repro this is likely a race of some kind that might be very hard to hit.

> Code: 85 80 0c 00 00 49 89 6d 40 48 b8 00 00 00 00 00 fc ff df 4c 89 ea 48 c1 ea 03 0f b6 04 02 84 c0 74 08 3c 03 0f 8e 56 10 00 00 <41> 39 5d 00 0f 84 10 06 00 00 bd 01 00 00 00 e9 de f3 ff ff 48 b8
> RSP: 0018:ffffc90003cdf6a8 EFLAGS: 00000246
> RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffc90003ce0000
> RDX: 1ffff9200079bee3 RSI: ffffc90003cdfa70 RDI: ffffc90003cdf758
> RBP: ffffc90003cdfae0 R08: 0000000000000001 R09: 0000000000000000
> R10: ffffc90003cdf718 R11: 00000000000121e6 R12: ffffc90003cdf768
> R13: ffffc90003cdf718 R14: ffffc90003cdfa80 R15: ffffc90003cdf74c
>  arch_stack_walk+0x94/0x100 arch/x86/kernel/stacktrace.c:25
>  stack_trace_save+0x8e/0xc0 kernel/stacktrace.c:122
>  kasan_save_stack+0x33/0x60 mm/kasan/common.c:47
>  kasan_record_aux_stack+0xa7/0xc0 mm/kasan/generic.c:548
>  slab_free_hook mm/slub.c:2378 [inline]

Invokes the CONFIG_SLUB_RCU_DEBUG stack trace saving stuff

>  slab_free mm/slub.c:4680 [inline]
>  kmem_cache_free+0x15a/0x4d0 mm/slub.c:4782

Note that VMAs are SLAB_TYPESAFE_BY_RCU so maybe that's somehow playing a role
here?

In free_slab():

	if (unlikely(s->flags & SLAB_TYPESAFE_BY_RCU))
		call_rcu(&slab->rcu_head, rcu_free_slab);

>  vms_complete_munmap_vmas+0x573/0x970 mm/vma.c:1293
>  do_vmi_align_munmap+0x43b/0x7d0 mm/vma.c:1536
>  do_vmi_munmap+0x204/0x3e0 mm/vma.c:1584
>  __vm_munmap+0x19a/0x390 mm/vma.c:3155
>  __do_sys_munmap mm/mmap.c:1080 [inline]
>  __se_sys_munmap mm/mmap.c:1077 [inline]
>  __x64_sys_munmap+0x59/0x80 mm/mmap.c:1077
>  do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
>  do_syscall_64+0xcd/0x4c0 arch/x86/entry/syscall_64.c:94
>  entry_SYSCALL_64_after_hwframe+0x77/0x7f

Seems a normal trace for an unmap, note (inlining removes stuff here) it's:

vms_complete_munmap_vmas() -> remove_vma() -> vm_area_free() -> kmem_cache_free()

> RIP: 0033:0x7fb13ec2f2e7
> RSP: 002b:00007fffe10fae78 EFLAGS: 00000206 ORIG_RAX: 000000000000000b
> RAX: ffffffffffffffda RBX: 0000562bd1443510 RCX: 00007fb13ec2f2e7
> RDX: 0000000000000002 RSI: 0000000000004028 RDI: 00007fb13ea1b000
> RBP: 00007fffe10faf80 R08: 0000562bd1432470 R09: 0000000000000001
> R10: 00007fffe10fadb0 R11: 0000000000000206 R12: 00007fffe10faea0
> R13: 00007fb13ec42000 R14: 0000562bd1443510 R15: 0000000000000000
>  </TASK>
> task:syz-executor    state:R  running task     stack:27632 pid:6031  tgid:6031  ppid:5870   task_flags:0x400000 flags:0x00004000
> Call Trace:
>  <TASK>
>  context_switch kernel/sched/core.c:5357 [inline]
>  __schedule+0x1190/0x5de0 kernel/sched/core.c:6961
>  preempt_schedule_common+0x44/0xc0 kernel/sched/core.c:7145
>  preempt_schedule_thunk+0x16/0x30 arch/x86/entry/thunk.S:12
>  __raw_spin_unlock include/linux/spinlock_api_smp.h:143 [inline]
>  _raw_spin_unlock+0x3e/0x50 kernel/locking/spinlock.c:186
>  spin_unlock include/linux/spinlock.h:391 [inline]
>  filemap_map_pages+0xe15/0x1670 mm/filemap.c:3791
>  do_fault_around mm/memory.c:5531 [inline]
>  do_read_fault mm/memory.c:5564 [inline]
>  do_fault mm/memory.c:5707 [inline]
>  do_pte_missing+0xe39/0x3ba0 mm/memory.c:4234
>  handle_pte_fault mm/memory.c:6052 [inline]
>  __handle_mm_fault+0x152a/0x2a50 mm/memory.c:6195
>  handle_mm_fault+0x589/0xd10 mm/memory.c:6364
>  do_user_addr_fault+0x60c/0x1370 arch/x86/mm/fault.c:1336
>  handle_page_fault arch/x86/mm/fault.c:1476 [inline]
>  exc_page_fault+0x5c/0xb0 arch/x86/mm/fault.c:1532
>  asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:623

Faulting path being context switched on unlock of PTE spinlock...

> RIP: 0033:0x7f54cd7177c7
> RSP: 002b:00007fffb79a5b40 EFLAGS: 00010246
> RAX: 00007f54ce525000 RBX: 0000000000000000 RCX: 0000000000000064
> RDX: 00007fffb79a5de9 RSI: 0000000000000002 RDI: 00007fffb79a5dd8
> RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000002
> R13: 00007fffb79a5c48 R14: 0000000000000000 R15: 0000000000000000
>  </TASK>
> task:kworker/0:3     state:R  running task     stack:25368 pid:1208  tgid:1208  ppid:2      task_flags:0x4208060 flags:0x00004000
> Workqueue: events_power_efficient gc_worker
> Call Trace:
>  <TASK>
>  context_switch kernel/sched/core.c:5357 [inline]
>  __schedule+0x1190/0x5de0 kernel/sched/core.c:6961
>  preempt_schedule_irq+0x51/0x90 kernel/sched/core.c:7288
>  irqentry_exit+0x36/0x90 kernel/entry/common.c:197
>  asm_sysvec_reschedule_ipi+0x1a/0x20 arch/x86/include/asm/idtentry.h:707
> RIP: 0010:write_comp_data+0x0/0x90 kernel/kcov.c:240
> Code: 48 8b 05 db b4 1a 12 48 8b 80 30 16 00 00 e9 97 05 db 09 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 <49> 89 d2 49 89 f8 49 89 f1 65 48 8b 15 a7 b4 1a 12 65 8b 05 b8 b4
> RSP: 0018:ffffc9000441fb50 EFLAGS: 00000293
> RAX: 0000000000000000 RBX: 0000000000040000 RCX: ffffffff89ba2a52
> RDX: 0000000000040000 RSI: 0000000000000433 RDI: 0000000000000004
> RBP: ffffffff9b2c41ec R08: 0000000000000004 R09: 0000000000000000
> R10: 0000000000000000 R11: ffffffff9b030610 R12: ffff888031800000
> R13: 0000000000000433 R14: dffffc0000000000 R15: 0000000000001770
>  gc_worker+0x342/0x16e0 net/netfilter/nf_conntrack_core.c:1549
>  process_one_work+0x9cc/0x1b70 kernel/workqueue.c:3236
>  process_scheduled_works kernel/workqueue.c:3319 [inline]
>  worker_thread+0x6c8/0xf10 kernel/workqueue.c:3400
>  kthread+0x3c5/0x780 kernel/kthread.c:463
>  ret_from_fork+0x5d7/0x6f0 arch/x86/kernel/process.c:148
>  ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
>  </TASK>
> task:dhcpcd          state:R  running task     stack:26072 pid:6029  tgid:6029  ppid:5513   task_flags:0x400040 flags:0x00004002
> Call Trace:
>  <TASK>
>  context_switch kernel/sched/core.c:5357 [inline]
>  __schedule+0x1190/0x5de0 kernel/sched/core.c:6961
>  preempt_schedule_irq+0x51/0x90 kernel/sched/core.c:7288
>  irqentry_exit+0x36/0x90 kernel/entry/common.c:197
>  asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:702
> RIP: 0010:orc_ip arch/x86/kernel/unwind_orc.c:80 [inline]
> RIP: 0010:__orc_find+0x7e/0xf0 arch/x86/kernel/unwind_orc.c:102
> Code: ea 3f 48 c1 fe 02 48 01 f2 48 d1 fa 48 8d 5c 95 00 48 89 da 48 c1 ea 03 0f b6 34 0a 48 89 da 83 e2 07 83 c2 03 40 38 f2 7c 05 <40> 84 f6 75 4b 48 63 13 48 01 da 49 39 d5 73 af 4c 8d 63 fc 49 39
> RSP: 0018:ffffc90003337648 EFLAGS: 00000202
> RAX: ffffffff914e0dd8 RBX: ffffffff90c5215c RCX: dffffc0000000000
> RDX: 0000000000000007 RSI: 0000000000000000 RDI: ffffffff90c52148
> RBP: ffffffff90c52148 R08: ffffffff914e0e1a R09: 0000000000000000
> R10: ffffc900033376f8 R11: 0000000000011271 R12: ffffffff90c52170
> R13: ffffffff82127173 R14: ffffffff90c52148 R15: ffffffff90c52148
>  orc_find arch/x86/kernel/unwind_orc.c:227 [inline]
>  unwind_next_frame+0x2ec/0x20a0 arch/x86/kernel/unwind_orc.c:494

This is also RCU-read locked.

>  arch_stack_walk+0x94/0x100 arch/x86/kernel/stacktrace.c:25
>  stack_trace_save+0x8e/0xc0 kernel/stacktrace.c:122
>  kasan_save_stack+0x33/0x60 mm/kasan/common.c:47
>  kasan_save_track+0x14/0x30 mm/kasan/common.c:68
>  poison_kmalloc_redzone mm/kasan/common.c:388 [inline]
>  __kasan_kmalloc+0xaa/0xb0 mm/kasan/common.c:405
>  kmalloc_noprof include/linux/slab.h:905 [inline]
>  slab_free_hook mm/slub.c:2369 [inline]
>  slab_free mm/slub.c:4680 [inline]
>  kmem_cache_free+0x142/0x4d0 mm/slub.c:4782
>  vms_complete_munmap_vmas+0x573/0x970 mm/vma.c:1293
>  do_vmi_align_munmap+0x43b/0x7d0 mm/vma.c:1536
>  do_vmi_munmap+0x204/0x3e0 mm/vma.c:1584
>  __vm_munmap+0x19a/0x390 mm/vma.c:3155

Simultaneous unmap?

>  __do_sys_munmap mm/mmap.c:1080 [inline]
>  __se_sys_munmap mm/mmap.c:1077 [inline]
>  __x64_sys_munmap+0x59/0x80 mm/mmap.c:1077
>  do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
>  do_syscall_64+0xcd/0x4c0 arch/x86/entry/syscall_64.c:94
>  entry_SYSCALL_64_after_hwframe+0x77/0x7f
> RIP: 0033:0x7fb13ec2f2e7
> RSP: 002b:00007fffe10fae78 EFLAGS: 00000202 ORIG_RAX: 000000000000000b
> RAX: ffffffffffffffda RBX: 0000562bd1443f00 RCX: 00007fb13ec2f2e7
> RDX: 0000000000000001 RSI: 000000000002f6d0 RDI: 00007fb13e9c1000
> RBP: 00007fffe10faf80 R08: 00000000000004f0 R09: 0000000000000002
> R10: 00007fffe10fadb0 R11: 0000000000000202 R12: 00007fffe10faec0
> R13: 00007fb13ec42000 R14: 0000562bd1443f00 R15: 0000000000000000
>  </TASK>
>
>
> ---
> This report is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkaller@googlegroups.com.
>
> syzbot will keep track of this issue. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
>
> If the report is already addressed, let syzbot know by replying with:
> #syz fix: exact-commit-title
>
> If you want syzbot to run the reproducer, reply with:
> #syz test: git://repo/address.git branch-or-commit-hash
> If you attach or paste a git patch, syzbot will apply it before testing.
>
> If you want to overwrite report's subsystems, reply with:
> #syz set subsystems: new-subsystem
> (See the list of subsystem names on the web dashboard)
>
> If the report is a duplicate of another one, reply with:
> #syz dup: exact-subject-of-another-report
>
> If you want to undo deduplication, reply with:
> #syz undup

Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [syzbot] [mm?] INFO: rcu detected stall in sys_munmap (2)
  2025-08-22 12:08 ` Lorenzo Stoakes
@ 2025-08-22 13:55   ` Harry Yoo
  2025-08-28  0:29     ` Josh Poimboeuf
  0 siblings, 1 reply; 11+ messages in thread
From: Harry Yoo @ 2025-08-22 13:55 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: syzbot, Liam.Howlett, akpm, jannh, linux-kernel, linux-mm,
	pfalcato, syzkaller-bugs, vbabka, Sebastian Andrzej Siewior,
	jpoimboe, peterz

On Fri, Aug 22, 2025 at 01:08:02PM +0100, Lorenzo Stoakes wrote:
> +cc Sebastian for RCU ORC change...
> 
> +cc Harry for slab side.

+cc Josh and Peter for stack unwinding stuff.

> Pinging Jann for the CONFIG_SLUB_RCU_DEBUG element.
> 
> Jann - could this possibly be related to CONFIG_SLUB_RCU_DEBUG? As it seems to
> the stack is within KASAN, but no KASAN report so maybe it's KASAN itself that's
> having an issue?
> 
> Though I'm thinking maybe it's the orc unwinder itself that could be problematic
> here (yet invoked by CONFIG_SLUB_RCU_DEBUG though)... and yeah kinda suspcious
> because:
> 
> - We have two threads freeing VMAs using SLAB_TYPESAFE_BY_RCU
> - CONFIG_SLUB_RCU_DEBUG means that we use KASAN to save an aux stack, which
>   makes us do an unwind via ORC, which then takes an RCU read lock on
>   unwind_next_frame(), and both are doing this unwinding at the time of report.
> - ???
> - Somehow things get locked up?
> 
> I'm not an RCU expert (clearly :) so I'm not sure exactly how this could result
> in a stall, but it's suspicious.

Can this be because of misleading ORC data or logical error in ORC unwinder
that makes it fall into an infinite loop (unwind_done() never returning
true in arch_stack_walk())?

...because the reported line number reported doesn't really make sense
as a cause of stalls.

-- 
Cheers,
Harry / Hyeonggon

> On Thu, Aug 21, 2025 at 09:15:37PM -0700, syzbot wrote:
> > Hello,
> >
> > syzbot found the following issue on:
> >
> > HEAD commit:    be48bcf004f9 Merge tag 'for-6.17-rc2-tag' of git://git.ker..
> > git tree:       upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=136dfba2580000
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=142508fb116c212f
> 
> lockdep (CONFIG_PROVE_LOCKING) is on, so I'm guessing there's no deadlock here.
> 
> CONFIG_DEBUG_VM_MAPLE_TREE is enabled, which will cause _major_ slowdown on VMA
> operations as the tree is constantly being fully validated.
> 
> This may explain the stalls...
> 
> > dashboard link: https://syzkaller.appspot.com/bug?extid=8785aaf121cfb2141e0d
> > compiler:       gcc (Debian 12.2.0-14+deb12u1) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
> > syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=107a43bc580000
> 
> No C repro yet...
> 
> >
> > Downloadable assets:
> > disk image: https://storage.googleapis.com/syzbot-assets/24fd400c6842/disk-be48bcf0.raw.xz
> > vmlinux: https://storage.googleapis.com/syzbot-assets/59146305635d/vmlinux-be48bcf0.xz
> > kernel image: https://storage.googleapis.com/syzbot-assets/b3e5f65cbcc8/bzImage-be48bcf0.xz
> >
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: syzbot+8785aaf121cfb2141e0d@syzkaller.appspotmail.com
> >
> > rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> > rcu: 	Tasks blocked on level-0 rcu_node (CPUs 0-1): P6029/1:b..l P1208/1:b..l P6031/3:b..l P6030/1:b..l
> > rcu: 	(detected by 1, t=10502 jiffies, g=6285, q=421 ncpus=2)
> 
> So 105s, or 1m45s, that's pretty long...
> 
> > task:dhcpcd          state:R  running task     stack:28896 pid:6030  tgid:6030  ppid:5513   task_flags:0x400040 flags:0x00004002
> > Call Trace:
> >  <TASK>
> >  context_switch kernel/sched/core.c:5357 [inline]
> >  __schedule+0x1190/0x5de0 kernel/sched/core.c:6961
> >  preempt_schedule_irq+0x51/0x90 kernel/sched/core.c:7288
> >  irqentry_exit+0x36/0x90 kernel/entry/common.c:197
> >  asm_sysvec_reschedule_ipi+0x1a/0x20 arch/x86/include/asm/idtentry.h:707
> > RIP: 0010:unwind_next_frame+0xfe7/0x20a0 arch/x86/kernel/unwind_orc.c:664
> 
> Hmm, while the line number is not pertinent, I notice unwind_next_frame() has:
> 
> guard(rcu)()
> 
> In it from commit 14daa3bca217 ("x86: Use RCU in all users of
> __module_address().") though from Jan 2025...
> 
> This is defined (took me a while to track down!!) in include/linux/rcupdate.h:
> 
> DEFINE_LOCK_GUARD_0(rcu,
> 	do {
> 		rcu_read_lock();
> 		/*
> 		 * sparse doesn't call the cleanup function,
> 		 * so just release immediately and don't track
> 		 * the context. We don't need to anyway, since
> 		 * the whole point of the guard is to not need
> 		 * the explicit unlock.
> 		 */
> 		__release(RCU);
> 	} while (0),
> 	rcu_read_unlock())
> 
> Meaning it's equivalent to a scoped rcu_read_lock() / rcu_read_unlock().
> 
> But since no C repro this is likely a race of some kind that might be very hard to hit.
> 
> > Code: 85 80 0c 00 00 49 89 6d 40 48 b8 00 00 00 00 00 fc ff df 4c 89 ea 48 c1 ea 03 0f b6 04 02 84 c0 74 08 3c 03 0f 8e 56 10 00 00 <41> 39 5d 00 0f 84 10 06 00 00 bd 01 00 00 00 e9 de f3 ff ff 48 b8
> > RSP: 0018:ffffc90003cdf6a8 EFLAGS: 00000246
> > RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffc90003ce0000
> > RDX: 1ffff9200079bee3 RSI: ffffc90003cdfa70 RDI: ffffc90003cdf758
> > RBP: ffffc90003cdfae0 R08: 0000000000000001 R09: 0000000000000000
> > R10: ffffc90003cdf718 R11: 00000000000121e6 R12: ffffc90003cdf768
> > R13: ffffc90003cdf718 R14: ffffc90003cdfa80 R15: ffffc90003cdf74c
> >  arch_stack_walk+0x94/0x100 arch/x86/kernel/stacktrace.c:25
> >  stack_trace_save+0x8e/0xc0 kernel/stacktrace.c:122
> >  kasan_save_stack+0x33/0x60 mm/kasan/common.c:47
> >  kasan_record_aux_stack+0xa7/0xc0 mm/kasan/generic.c:548
> >  slab_free_hook mm/slub.c:2378 [inline]
> 
> Invokes the CONFIG_SLUB_RCU_DEBUG stack trace saving stuff
> 
> >  slab_free mm/slub.c:4680 [inline]
> >  kmem_cache_free+0x15a/0x4d0 mm/slub.c:4782
> 
> Note that VMAs are SLAB_TYPESAFE_BY_RCU so maybe that's somehow playing a role
> here?
> 
> In free_slab():
> 
> 	if (unlikely(s->flags & SLAB_TYPESAFE_BY_RCU))
> 		call_rcu(&slab->rcu_head, rcu_free_slab);
> 
> >  vms_complete_munmap_vmas+0x573/0x970 mm/vma.c:1293
> >  do_vmi_align_munmap+0x43b/0x7d0 mm/vma.c:1536
> >  do_vmi_munmap+0x204/0x3e0 mm/vma.c:1584
> >  __vm_munmap+0x19a/0x390 mm/vma.c:3155
> >  __do_sys_munmap mm/mmap.c:1080 [inline]
> >  __se_sys_munmap mm/mmap.c:1077 [inline]
> >  __x64_sys_munmap+0x59/0x80 mm/mmap.c:1077
> >  do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> >  do_syscall_64+0xcd/0x4c0 arch/x86/entry/syscall_64.c:94
> >  entry_SYSCALL_64_after_hwframe+0x77/0x7f
> 
> Seems a normal trace for an unmap, note (inlining removes stuff here) it's:
> 
> vms_complete_munmap_vmas() -> remove_vma() -> vm_area_free() -> kmem_cache_free()
> 
> > RIP: 0033:0x7fb13ec2f2e7
> > RSP: 002b:00007fffe10fae78 EFLAGS: 00000206 ORIG_RAX: 000000000000000b
> > RAX: ffffffffffffffda RBX: 0000562bd1443510 RCX: 00007fb13ec2f2e7
> > RDX: 0000000000000002 RSI: 0000000000004028 RDI: 00007fb13ea1b000
> > RBP: 00007fffe10faf80 R08: 0000562bd1432470 R09: 0000000000000001
> > R10: 00007fffe10fadb0 R11: 0000000000000206 R12: 00007fffe10faea0
> > R13: 00007fb13ec42000 R14: 0000562bd1443510 R15: 0000000000000000
> >  </TASK>
> > task:syz-executor    state:R  running task     stack:27632 pid:6031  tgid:6031  ppid:5870   task_flags:0x400000 flags:0x00004000
> > Call Trace:
> >  <TASK>
> >  context_switch kernel/sched/core.c:5357 [inline]
> >  __schedule+0x1190/0x5de0 kernel/sched/core.c:6961
> >  preempt_schedule_common+0x44/0xc0 kernel/sched/core.c:7145
> >  preempt_schedule_thunk+0x16/0x30 arch/x86/entry/thunk.S:12
> >  __raw_spin_unlock include/linux/spinlock_api_smp.h:143 [inline]
> >  _raw_spin_unlock+0x3e/0x50 kernel/locking/spinlock.c:186
> >  spin_unlock include/linux/spinlock.h:391 [inline]
> >  filemap_map_pages+0xe15/0x1670 mm/filemap.c:3791
> >  do_fault_around mm/memory.c:5531 [inline]
> >  do_read_fault mm/memory.c:5564 [inline]
> >  do_fault mm/memory.c:5707 [inline]
> >  do_pte_missing+0xe39/0x3ba0 mm/memory.c:4234
> >  handle_pte_fault mm/memory.c:6052 [inline]
> >  __handle_mm_fault+0x152a/0x2a50 mm/memory.c:6195
> >  handle_mm_fault+0x589/0xd10 mm/memory.c:6364
> >  do_user_addr_fault+0x60c/0x1370 arch/x86/mm/fault.c:1336
> >  handle_page_fault arch/x86/mm/fault.c:1476 [inline]
> >  exc_page_fault+0x5c/0xb0 arch/x86/mm/fault.c:1532
> >  asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:623
> 
> Faulting path being context switched on unlock of PTE spinlock...
> 
> > RIP: 0033:0x7f54cd7177c7
> > RSP: 002b:00007fffb79a5b40 EFLAGS: 00010246
> > RAX: 00007f54ce525000 RBX: 0000000000000000 RCX: 0000000000000064
> > RDX: 00007fffb79a5de9 RSI: 0000000000000002 RDI: 00007fffb79a5dd8
> > RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> > R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000002
> > R13: 00007fffb79a5c48 R14: 0000000000000000 R15: 0000000000000000
> >  </TASK>
> > task:kworker/0:3     state:R  running task     stack:25368 pid:1208  tgid:1208  ppid:2      task_flags:0x4208060 flags:0x00004000
> > Workqueue: events_power_efficient gc_worker
> > Call Trace:
> >  <TASK>
> >  context_switch kernel/sched/core.c:5357 [inline]
> >  __schedule+0x1190/0x5de0 kernel/sched/core.c:6961
> >  preempt_schedule_irq+0x51/0x90 kernel/sched/core.c:7288
> >  irqentry_exit+0x36/0x90 kernel/entry/common.c:197
> >  asm_sysvec_reschedule_ipi+0x1a/0x20 arch/x86/include/asm/idtentry.h:707
> > RIP: 0010:write_comp_data+0x0/0x90 kernel/kcov.c:240
> > Code: 48 8b 05 db b4 1a 12 48 8b 80 30 16 00 00 e9 97 05 db 09 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 <49> 89 d2 49 89 f8 49 89 f1 65 48 8b 15 a7 b4 1a 12 65 8b 05 b8 b4
> > RSP: 0018:ffffc9000441fb50 EFLAGS: 00000293
> > RAX: 0000000000000000 RBX: 0000000000040000 RCX: ffffffff89ba2a52
> > RDX: 0000000000040000 RSI: 0000000000000433 RDI: 0000000000000004
> > RBP: ffffffff9b2c41ec R08: 0000000000000004 R09: 0000000000000000
> > R10: 0000000000000000 R11: ffffffff9b030610 R12: ffff888031800000
> > R13: 0000000000000433 R14: dffffc0000000000 R15: 0000000000001770
> >  gc_worker+0x342/0x16e0 net/netfilter/nf_conntrack_core.c:1549
> >  process_one_work+0x9cc/0x1b70 kernel/workqueue.c:3236
> >  process_scheduled_works kernel/workqueue.c:3319 [inline]
> >  worker_thread+0x6c8/0xf10 kernel/workqueue.c:3400
> >  kthread+0x3c5/0x780 kernel/kthread.c:463
> >  ret_from_fork+0x5d7/0x6f0 arch/x86/kernel/process.c:148
> >  ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
> >  </TASK>
> > task:dhcpcd          state:R  running task     stack:26072 pid:6029  tgid:6029  ppid:5513   task_flags:0x400040 flags:0x00004002
> > Call Trace:
> >  <TASK>
> >  context_switch kernel/sched/core.c:5357 [inline]
> >  __schedule+0x1190/0x5de0 kernel/sched/core.c:6961
> >  preempt_schedule_irq+0x51/0x90 kernel/sched/core.c:7288
> >  irqentry_exit+0x36/0x90 kernel/entry/common.c:197
> >  asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:702
> > RIP: 0010:orc_ip arch/x86/kernel/unwind_orc.c:80 [inline]
> > RIP: 0010:__orc_find+0x7e/0xf0 arch/x86/kernel/unwind_orc.c:102
> > Code: ea 3f 48 c1 fe 02 48 01 f2 48 d1 fa 48 8d 5c 95 00 48 89 da 48 c1 ea 03 0f b6 34 0a 48 89 da 83 e2 07 83 c2 03 40 38 f2 7c 05 <40> 84 f6 75 4b 48 63 13 48 01 da 49 39 d5 73 af 4c 8d 63 fc 49 39
> > RSP: 0018:ffffc90003337648 EFLAGS: 00000202
> > RAX: ffffffff914e0dd8 RBX: ffffffff90c5215c RCX: dffffc0000000000
> > RDX: 0000000000000007 RSI: 0000000000000000 RDI: ffffffff90c52148
> > RBP: ffffffff90c52148 R08: ffffffff914e0e1a R09: 0000000000000000
> > R10: ffffc900033376f8 R11: 0000000000011271 R12: ffffffff90c52170
> > R13: ffffffff82127173 R14: ffffffff90c52148 R15: ffffffff90c52148
> >  orc_find arch/x86/kernel/unwind_orc.c:227 [inline]
> >  unwind_next_frame+0x2ec/0x20a0 arch/x86/kernel/unwind_orc.c:494
> 
> This is also RCU-read locked.
> 
> >  arch_stack_walk+0x94/0x100 arch/x86/kernel/stacktrace.c:25
> >  stack_trace_save+0x8e/0xc0 kernel/stacktrace.c:122
> >  kasan_save_stack+0x33/0x60 mm/kasan/common.c:47
> >  kasan_save_track+0x14/0x30 mm/kasan/common.c:68
> >  poison_kmalloc_redzone mm/kasan/common.c:388 [inline]
> >  __kasan_kmalloc+0xaa/0xb0 mm/kasan/common.c:405
> >  kmalloc_noprof include/linux/slab.h:905 [inline]
> >  slab_free_hook mm/slub.c:2369 [inline]
> >  slab_free mm/slub.c:4680 [inline]
> >  kmem_cache_free+0x142/0x4d0 mm/slub.c:4782
> >  vms_complete_munmap_vmas+0x573/0x970 mm/vma.c:1293
> >  do_vmi_align_munmap+0x43b/0x7d0 mm/vma.c:1536
> >  do_vmi_munmap+0x204/0x3e0 mm/vma.c:1584
> >  __vm_munmap+0x19a/0x390 mm/vma.c:3155
> 
> Simultaneous unmap?
> 
> >  __do_sys_munmap mm/mmap.c:1080 [inline]
> >  __se_sys_munmap mm/mmap.c:1077 [inline]
> >  __x64_sys_munmap+0x59/0x80 mm/mmap.c:1077
> >  do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> >  do_syscall_64+0xcd/0x4c0 arch/x86/entry/syscall_64.c:94
> >  entry_SYSCALL_64_after_hwframe+0x77/0x7f
> > RIP: 0033:0x7fb13ec2f2e7
> > RSP: 002b:00007fffe10fae78 EFLAGS: 00000202 ORIG_RAX: 000000000000000b
> > RAX: ffffffffffffffda RBX: 0000562bd1443f00 RCX: 00007fb13ec2f2e7
> > RDX: 0000000000000001 RSI: 000000000002f6d0 RDI: 00007fb13e9c1000
> > RBP: 00007fffe10faf80 R08: 00000000000004f0 R09: 0000000000000002
> > R10: 00007fffe10fadb0 R11: 0000000000000202 R12: 00007fffe10faec0
> > R13: 00007fb13ec42000 R14: 0000562bd1443f00 R15: 0000000000000000
> >  </TASK>
> >
> >
> > ---
> > This report is generated by a bot. It may contain errors.
> > See https://goo.gl/tpsmEJ for more information about syzbot.
> > syzbot engineers can be reached at syzkaller@googlegroups.com.
> >
> > syzbot will keep track of this issue. See:
> > https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
> >
> > If the report is already addressed, let syzbot know by replying with:
> > #syz fix: exact-commit-title
> >
> > If you want syzbot to run the reproducer, reply with:
> > #syz test: git://repo/address.git branch-or-commit-hash
> > If you attach or paste a git patch, syzbot will apply it before testing.
> >
> > If you want to overwrite report's subsystems, reply with:
> > #syz set subsystems: new-subsystem
> > (See the list of subsystem names on the web dashboard)
> >
> > If the report is a duplicate of another one, reply with:
> > #syz dup: exact-subject-of-another-report
> >
> > If you want to undo deduplication, reply with:
> > #syz undup
> 
> Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [syzbot] [mm?] INFO: rcu detected stall in sys_munmap (2)
  2025-08-22 13:55   ` Harry Yoo
@ 2025-08-28  0:29     ` Josh Poimboeuf
  2025-08-28  1:57       ` Liam R. Howlett
  0 siblings, 1 reply; 11+ messages in thread
From: Josh Poimboeuf @ 2025-08-28  0:29 UTC (permalink / raw)
  To: Harry Yoo
  Cc: Lorenzo Stoakes, syzbot, Liam.Howlett, akpm, jannh, linux-kernel,
	linux-mm, pfalcato, syzkaller-bugs, vbabka,
	Sebastian Andrzej Siewior, peterz

On Fri, Aug 22, 2025 at 10:55:10PM +0900, Harry Yoo wrote:
> On Fri, Aug 22, 2025 at 01:08:02PM +0100, Lorenzo Stoakes wrote:
> > +cc Sebastian for RCU ORC change...
> > 
> > +cc Harry for slab side.
> 
> +cc Josh and Peter for stack unwinding stuff.
> 
> > Pinging Jann for the CONFIG_SLUB_RCU_DEBUG element.
> > 
> > Jann - could this possibly be related to CONFIG_SLUB_RCU_DEBUG? As it seems to
> > the stack is within KASAN, but no KASAN report so maybe it's KASAN itself that's
> > having an issue?
> > 
> > Though I'm thinking maybe it's the orc unwinder itself that could be problematic
> > here (yet invoked by CONFIG_SLUB_RCU_DEBUG though)... and yeah kinda suspcious
> > because:
> > 
> > - We have two threads freeing VMAs using SLAB_TYPESAFE_BY_RCU
> > - CONFIG_SLUB_RCU_DEBUG means that we use KASAN to save an aux stack, which
> >   makes us do an unwind via ORC, which then takes an RCU read lock on
> >   unwind_next_frame(), and both are doing this unwinding at the time of report.
> > - ???
> > - Somehow things get locked up?
> > 
> > I'm not an RCU expert (clearly :) so I'm not sure exactly how this could result
> > in a stall, but it's suspicious.
> 
> Can this be because of misleading ORC data or logical error in ORC unwinder
> that makes it fall into an infinite loop (unwind_done() never returning
> true in arch_stack_walk())?
> 
> ...because the reported line number reported doesn't really make sense
> as a cause of stalls.

There shouldn't be any way for ORC to hit an infinite loop.  Worst case
it would stop after the caller's buffer fills up.  ORC has always been
solid, and the RCU usage looks fine to me.  I tend to doubt ORC is at
fault here.

Maybe some interaction higher up the stack is causing things to run in a
tight loop.

All those debugging options (e.g., DEBUG_VM_MAPLE_TREE, LOCKDEP, KASAN,
SLUB_RCU_DEBUG...) could be a factor in slowing things down to a crawl.

-- 
Josh


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [syzbot] [mm?] INFO: rcu detected stall in sys_munmap (2)
  2025-08-28  0:29     ` Josh Poimboeuf
@ 2025-08-28  1:57       ` Liam R. Howlett
  2025-08-28  3:35         ` Liam R. Howlett
  0 siblings, 1 reply; 11+ messages in thread
From: Liam R. Howlett @ 2025-08-28  1:57 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Harry Yoo, Lorenzo Stoakes, syzbot, akpm, jannh, linux-kernel,
	linux-mm, pfalcato, syzkaller-bugs, vbabka,
	Sebastian Andrzej Siewior, peterz

* Josh Poimboeuf <jpoimboe@kernel.org> [250827 20:29]:
> On Fri, Aug 22, 2025 at 10:55:10PM +0900, Harry Yoo wrote:
> > On Fri, Aug 22, 2025 at 01:08:02PM +0100, Lorenzo Stoakes wrote:
> > > +cc Sebastian for RCU ORC change...
> > > 
> > > +cc Harry for slab side.
> > 
> > +cc Josh and Peter for stack unwinding stuff.
> > 
> > > Pinging Jann for the CONFIG_SLUB_RCU_DEBUG element.
> > > 
> > > Jann - could this possibly be related to CONFIG_SLUB_RCU_DEBUG? As it seems to
> > > the stack is within KASAN, but no KASAN report so maybe it's KASAN itself that's
> > > having an issue?
> > > 
> > > Though I'm thinking maybe it's the orc unwinder itself that could be problematic
> > > here (yet invoked by CONFIG_SLUB_RCU_DEBUG though)... and yeah kinda suspcious
> > > because:
> > > 
> > > - We have two threads freeing VMAs using SLAB_TYPESAFE_BY_RCU
> > > - CONFIG_SLUB_RCU_DEBUG means that we use KASAN to save an aux stack, which
> > >   makes us do an unwind via ORC, which then takes an RCU read lock on
> > >   unwind_next_frame(), and both are doing this unwinding at the time of report.
> > > - ???
> > > - Somehow things get locked up?
> > > 
> > > I'm not an RCU expert (clearly :) so I'm not sure exactly how this could result
> > > in a stall, but it's suspicious.
> > 
> > Can this be because of misleading ORC data or logical error in ORC unwinder
> > that makes it fall into an infinite loop (unwind_done() never returning
> > true in arch_stack_walk())?
> > 
> > ...because the reported line number reported doesn't really make sense
> > as a cause of stalls.
> 
> There shouldn't be any way for ORC to hit an infinite loop.  Worst case
> it would stop after the caller's buffer fills up.  ORC has always been
> solid, and the RCU usage looks fine to me.  I tend to doubt ORC is at
> fault here.
> 
> Maybe some interaction higher up the stack is causing things to run in a
> tight loop.
> 
> All those debugging options (e.g., DEBUG_VM_MAPLE_TREE, LOCKDEP, KASAN,
> SLUB_RCU_DEBUG...) could be a factor in slowing things down to a crawl.

DEBUG_VM_MAPLE_TREE is super heavy, but that comes from validate_mm()
which would be the last thing to happen before returning, usually.

I mean surely that would show up in the logs.

Okay it's in the second log on the dashboard..

Yeah, I think it's debug options eventually causing failure.  Apparently
there's a reproducer for syz now but without the validate_mm().


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [syzbot] [mm?] INFO: rcu detected stall in sys_munmap (2)
  2025-08-22  4:15 [syzbot] [mm?] INFO: rcu detected stall in sys_munmap (2) syzbot
  2025-08-22 12:08 ` Lorenzo Stoakes
@ 2025-08-28  2:05 ` Liam R. Howlett
  2025-08-28  2:05   ` syzbot
  2025-08-28  2:20 ` Liam R. Howlett
  2025-12-19 19:37 ` syzbot
  3 siblings, 1 reply; 11+ messages in thread
From: Liam R. Howlett @ 2025-08-28  2:05 UTC (permalink / raw)
  To: syzbot
  Cc: akpm, jannh, linux-kernel, linux-mm, lorenzo.stoakes, pfalcato,
	syzkaller-bugs, vbabka

* syzbot <syzbot+8785aaf121cfb2141e0d@syzkaller.appspotmail.com> [250822 00:15]:
> Hello,
> 
> syzbot found the following issue on:
> 
> HEAD commit:    be48bcf004f9 Merge tag 'for-6.17-rc2-tag' of git://git.ker..
> git tree:       upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=136dfba2580000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=142508fb116c212f
> dashboard link: https://syzkaller.appspot.com/bug?extid=8785aaf121cfb2141e0d
> compiler:       gcc (Debian 12.2.0-14+deb12u1) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=107a43bc580000
> 
> Downloadable assets:
> disk image: https://storage.googleapis.com/syzbot-assets/24fd400c6842/disk-be48bcf0.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/59146305635d/vmlinux-be48bcf0.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/b3e5f65cbcc8/bzImage-be48bcf0.xz
> 
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+8785aaf121cfb2141e0d@syzkaller.appspotmail.com
> 
> rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> rcu: 	Tasks blocked on level-0 rcu_node (CPUs 0-1): P6029/1:b..l P1208/1:b..l P6031/3:b..l P6030/1:b..l
> rcu: 	(detected by 1, t=10502 jiffies, g=6285, q=421 ncpus=2)
> task:dhcpcd          state:R  running task     stack:28896 pid:6030  tgid:6030  ppid:5513   task_flags:0x400040 flags:0x00004002
> Call Trace:
>  <TASK>
>  context_switch kernel/sched/core.c:5357 [inline]
>  __schedule+0x1190/0x5de0 kernel/sched/core.c:6961
>  preempt_schedule_irq+0x51/0x90 kernel/sched/core.c:7288
>  irqentry_exit+0x36/0x90 kernel/entry/common.c:197
>  asm_sysvec_reschedule_ipi+0x1a/0x20 arch/x86/include/asm/idtentry.h:707
> RIP: 0010:unwind_next_frame+0xfe7/0x20a0 arch/x86/kernel/unwind_orc.c:664
> Code: 85 80 0c 00 00 49 89 6d 40 48 b8 00 00 00 00 00 fc ff df 4c 89 ea 48 c1 ea 03 0f b6 04 02 84 c0 74 08 3c 03 0f 8e 56 10 00 00 <41> 39 5d 00 0f 84 10 06 00 00 bd 01 00 00 00 e9 de f3 ff ff 48 b8
> RSP: 0018:ffffc90003cdf6a8 EFLAGS: 00000246
> RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffc90003ce0000
> RDX: 1ffff9200079bee3 RSI: ffffc90003cdfa70 RDI: ffffc90003cdf758
> RBP: ffffc90003cdfae0 R08: 0000000000000001 R09: 0000000000000000
> R10: ffffc90003cdf718 R11: 00000000000121e6 R12: ffffc90003cdf768
> R13: ffffc90003cdf718 R14: ffffc90003cdfa80 R15: ffffc90003cdf74c
>  arch_stack_walk+0x94/0x100 arch/x86/kernel/stacktrace.c:25
>  stack_trace_save+0x8e/0xc0 kernel/stacktrace.c:122
>  kasan_save_stack+0x33/0x60 mm/kasan/common.c:47
>  kasan_record_aux_stack+0xa7/0xc0 mm/kasan/generic.c:548
>  slab_free_hook mm/slub.c:2378 [inline]
>  slab_free mm/slub.c:4680 [inline]
>  kmem_cache_free+0x15a/0x4d0 mm/slub.c:4782
>  vms_complete_munmap_vmas+0x573/0x970 mm/vma.c:1293
>  do_vmi_align_munmap+0x43b/0x7d0 mm/vma.c:1536
>  do_vmi_munmap+0x204/0x3e0 mm/vma.c:1584
>  __vm_munmap+0x19a/0x390 mm/vma.c:3155
>  __do_sys_munmap mm/mmap.c:1080 [inline]
>  __se_sys_munmap mm/mmap.c:1077 [inline]
>  __x64_sys_munmap+0x59/0x80 mm/mmap.c:1077
>  do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
>  do_syscall_64+0xcd/0x4c0 arch/x86/entry/syscall_64.c:94
>  entry_SYSCALL_64_after_hwframe+0x77/0x7f
> RIP: 0033:0x7fb13ec2f2e7
> RSP: 002b:00007fffe10fae78 EFLAGS: 00000206 ORIG_RAX: 000000000000000b
> RAX: ffffffffffffffda RBX: 0000562bd1443510 RCX: 00007fb13ec2f2e7
> RDX: 0000000000000002 RSI: 0000000000004028 RDI: 00007fb13ea1b000
> RBP: 00007fffe10faf80 R08: 0000562bd1432470 R09: 0000000000000001
> R10: 00007fffe10fadb0 R11: 0000000000000206 R12: 00007fffe10faea0
> R13: 00007fb13ec42000 R14: 0000562bd1443510 R15: 0000000000000000
>  </TASK>
> task:syz-executor    state:R  running task     stack:27632 pid:6031  tgid:6031  ppid:5870   task_flags:0x400000 flags:0x00004000
> Call Trace:
>  <TASK>
>  context_switch kernel/sched/core.c:5357 [inline]
>  __schedule+0x1190/0x5de0 kernel/sched/core.c:6961
>  preempt_schedule_common+0x44/0xc0 kernel/sched/core.c:7145
>  preempt_schedule_thunk+0x16/0x30 arch/x86/entry/thunk.S:12
>  __raw_spin_unlock include/linux/spinlock_api_smp.h:143 [inline]
>  _raw_spin_unlock+0x3e/0x50 kernel/locking/spinlock.c:186
>  spin_unlock include/linux/spinlock.h:391 [inline]
>  filemap_map_pages+0xe15/0x1670 mm/filemap.c:3791
>  do_fault_around mm/memory.c:5531 [inline]
>  do_read_fault mm/memory.c:5564 [inline]
>  do_fault mm/memory.c:5707 [inline]
>  do_pte_missing+0xe39/0x3ba0 mm/memory.c:4234
>  handle_pte_fault mm/memory.c:6052 [inline]
>  __handle_mm_fault+0x152a/0x2a50 mm/memory.c:6195
>  handle_mm_fault+0x589/0xd10 mm/memory.c:6364
>  do_user_addr_fault+0x60c/0x1370 arch/x86/mm/fault.c:1336
>  handle_page_fault arch/x86/mm/fault.c:1476 [inline]
>  exc_page_fault+0x5c/0xb0 arch/x86/mm/fault.c:1532
>  asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:623
> RIP: 0033:0x7f54cd7177c7
> RSP: 002b:00007fffb79a5b40 EFLAGS: 00010246
> RAX: 00007f54ce525000 RBX: 0000000000000000 RCX: 0000000000000064
> RDX: 00007fffb79a5de9 RSI: 0000000000000002 RDI: 00007fffb79a5dd8
> RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000002
> R13: 00007fffb79a5c48 R14: 0000000000000000 R15: 0000000000000000
>  </TASK>
> task:kworker/0:3     state:R  running task     stack:25368 pid:1208  tgid:1208  ppid:2      task_flags:0x4208060 flags:0x00004000
> Workqueue: events_power_efficient gc_worker
> Call Trace:
>  <TASK>
>  context_switch kernel/sched/core.c:5357 [inline]
>  __schedule+0x1190/0x5de0 kernel/sched/core.c:6961
>  preempt_schedule_irq+0x51/0x90 kernel/sched/core.c:7288
>  irqentry_exit+0x36/0x90 kernel/entry/common.c:197
>  asm_sysvec_reschedule_ipi+0x1a/0x20 arch/x86/include/asm/idtentry.h:707
> RIP: 0010:write_comp_data+0x0/0x90 kernel/kcov.c:240
> Code: 48 8b 05 db b4 1a 12 48 8b 80 30 16 00 00 e9 97 05 db 09 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 <49> 89 d2 49 89 f8 49 89 f1 65 48 8b 15 a7 b4 1a 12 65 8b 05 b8 b4
> RSP: 0018:ffffc9000441fb50 EFLAGS: 00000293
> RAX: 0000000000000000 RBX: 0000000000040000 RCX: ffffffff89ba2a52
> RDX: 0000000000040000 RSI: 0000000000000433 RDI: 0000000000000004
> RBP: ffffffff9b2c41ec R08: 0000000000000004 R09: 0000000000000000
> R10: 0000000000000000 R11: ffffffff9b030610 R12: ffff888031800000
> R13: 0000000000000433 R14: dffffc0000000000 R15: 0000000000001770
>  gc_worker+0x342/0x16e0 net/netfilter/nf_conntrack_core.c:1549
>  process_one_work+0x9cc/0x1b70 kernel/workqueue.c:3236
>  process_scheduled_works kernel/workqueue.c:3319 [inline]
>  worker_thread+0x6c8/0xf10 kernel/workqueue.c:3400
>  kthread+0x3c5/0x780 kernel/kthread.c:463
>  ret_from_fork+0x5d7/0x6f0 arch/x86/kernel/process.c:148
>  ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
>  </TASK>
> task:dhcpcd          state:R  running task     stack:26072 pid:6029  tgid:6029  ppid:5513   task_flags:0x400040 flags:0x00004002
> Call Trace:
>  <TASK>
>  context_switch kernel/sched/core.c:5357 [inline]
>  __schedule+0x1190/0x5de0 kernel/sched/core.c:6961
>  preempt_schedule_irq+0x51/0x90 kernel/sched/core.c:7288
>  irqentry_exit+0x36/0x90 kernel/entry/common.c:197
>  asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:702
> RIP: 0010:orc_ip arch/x86/kernel/unwind_orc.c:80 [inline]
> RIP: 0010:__orc_find+0x7e/0xf0 arch/x86/kernel/unwind_orc.c:102
> Code: ea 3f 48 c1 fe 02 48 01 f2 48 d1 fa 48 8d 5c 95 00 48 89 da 48 c1 ea 03 0f b6 34 0a 48 89 da 83 e2 07 83 c2 03 40 38 f2 7c 05 <40> 84 f6 75 4b 48 63 13 48 01 da 49 39 d5 73 af 4c 8d 63 fc 49 39
> RSP: 0018:ffffc90003337648 EFLAGS: 00000202
> RAX: ffffffff914e0dd8 RBX: ffffffff90c5215c RCX: dffffc0000000000
> RDX: 0000000000000007 RSI: 0000000000000000 RDI: ffffffff90c52148
> RBP: ffffffff90c52148 R08: ffffffff914e0e1a R09: 0000000000000000
> R10: ffffc900033376f8 R11: 0000000000011271 R12: ffffffff90c52170
> R13: ffffffff82127173 R14: ffffffff90c52148 R15: ffffffff90c52148
>  orc_find arch/x86/kernel/unwind_orc.c:227 [inline]
>  unwind_next_frame+0x2ec/0x20a0 arch/x86/kernel/unwind_orc.c:494
>  arch_stack_walk+0x94/0x100 arch/x86/kernel/stacktrace.c:25
>  stack_trace_save+0x8e/0xc0 kernel/stacktrace.c:122
>  kasan_save_stack+0x33/0x60 mm/kasan/common.c:47
>  kasan_save_track+0x14/0x30 mm/kasan/common.c:68
>  poison_kmalloc_redzone mm/kasan/common.c:388 [inline]
>  __kasan_kmalloc+0xaa/0xb0 mm/kasan/common.c:405
>  kmalloc_noprof include/linux/slab.h:905 [inline]
>  slab_free_hook mm/slub.c:2369 [inline]
>  slab_free mm/slub.c:4680 [inline]
>  kmem_cache_free+0x142/0x4d0 mm/slub.c:4782
>  vms_complete_munmap_vmas+0x573/0x970 mm/vma.c:1293
>  do_vmi_align_munmap+0x43b/0x7d0 mm/vma.c:1536
>  do_vmi_munmap+0x204/0x3e0 mm/vma.c:1584
>  __vm_munmap+0x19a/0x390 mm/vma.c:3155
>  __do_sys_munmap mm/mmap.c:1080 [inline]
>  __se_sys_munmap mm/mmap.c:1077 [inline]
>  __x64_sys_munmap+0x59/0x80 mm/mmap.c:1077
>  do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
>  do_syscall_64+0xcd/0x4c0 arch/x86/entry/syscall_64.c:94
>  entry_SYSCALL_64_after_hwframe+0x77/0x7f
> RIP: 0033:0x7fb13ec2f2e7
> RSP: 002b:00007fffe10fae78 EFLAGS: 00000202 ORIG_RAX: 000000000000000b
> RAX: ffffffffffffffda RBX: 0000562bd1443f00 RCX: 00007fb13ec2f2e7
> RDX: 0000000000000001 RSI: 000000000002f6d0 RDI: 00007fb13e9c1000
> RBP: 00007fffe10faf80 R08: 00000000000004f0 R09: 0000000000000002
> R10: 00007fffe10fadb0 R11: 0000000000000202 R12: 00007fffe10faec0
> R13: 00007fb13ec42000 R14: 0000562bd1443f00 R15: 0000000000000000
>  </TASK>
> 
> 
> ---
> This report is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkaller@googlegroups.com.
> 
> syzbot will keep track of this issue. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
> 
> If the report is already addressed, let syzbot know by replying with:
> #syz fix: exact-commit-title
> 
> If you want syzbot to run the reproducer, reply with:
> #syz test: git://repo/address.git branch-or-commit-hash
> If you attach or paste a git patch, syzbot will apply it before testing.
> 
> If you want to overwrite report's subsystems, reply with:
> #syz set subsystems: new-subsystem
> (See the list of subsystem names on the web dashboard)
> 
> If the report is a duplicate of another one, reply with:
> #syz dup: exact-subject-of-another-report
> 
> If you want to undo deduplication, reply with:
> #syz undup

Let's see if speeding up the debug helps.

#syz test:

--- a/mm/vma.c
+++ b/mm/vma.c
@@ -648,6 +648,7 @@ void validate_mm(struct mm_struct *mm)
        struct vm_area_struct *vma;
        VMA_ITERATOR(vmi, mm, 0);
 
+       return;
        mt_validate(&mm->mm_mt);
        for_each_vma(vmi, vma) {
 #ifdef CONFIG_DEBUG_VM_RB



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [syzbot] [mm?] INFO: rcu detected stall in sys_munmap (2)
  2025-08-28  2:05 ` Liam R. Howlett
@ 2025-08-28  2:05   ` syzbot
  0 siblings, 0 replies; 11+ messages in thread
From: syzbot @ 2025-08-28  2:05 UTC (permalink / raw)
  To: liam.howlett
  Cc: akpm, jannh, liam.howlett, linux-kernel, linux-mm,
	lorenzo.stoakes, pfalcato, syzkaller-bugs, vbabka

> * syzbot <syzbot+8785aaf121cfb2141e0d@syzkaller.appspotmail.com> [250822 00:15]:
>> Hello,
>> 
>> syzbot found the following issue on:
>> 
>> HEAD commit:    be48bcf004f9 Merge tag 'for-6.17-rc2-tag' of git://git.ker..
>> git tree:       upstream
>> console output: https://syzkaller.appspot.com/x/log.txt?x=136dfba2580000
>> kernel config:  https://syzkaller.appspot.com/x/.config?x=142508fb116c212f
>> dashboard link: https://syzkaller.appspot.com/bug?extid=8785aaf121cfb2141e0d
>> compiler:       gcc (Debian 12.2.0-14+deb12u1) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
>> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=107a43bc580000
>> 
>> Downloadable assets:
>> disk image: https://storage.googleapis.com/syzbot-assets/24fd400c6842/disk-be48bcf0.raw.xz
>> vmlinux: https://storage.googleapis.com/syzbot-assets/59146305635d/vmlinux-be48bcf0.xz
>> kernel image: https://storage.googleapis.com/syzbot-assets/b3e5f65cbcc8/bzImage-be48bcf0.xz
>> 
>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
>> Reported-by: syzbot+8785aaf121cfb2141e0d@syzkaller.appspotmail.com
>> 
>> rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
>> rcu: 	Tasks blocked on level-0 rcu_node (CPUs 0-1): P6029/1:b..l P1208/1:b..l P6031/3:b..l P6030/1:b..l
>> rcu: 	(detected by 1, t=10502 jiffies, g=6285, q=421 ncpus=2)
>> task:dhcpcd          state:R  running task     stack:28896 pid:6030  tgid:6030  ppid:5513   task_flags:0x400040 flags:0x00004002
>> Call Trace:
>>  <TASK>
>>  context_switch kernel/sched/core.c:5357 [inline]
>>  __schedule+0x1190/0x5de0 kernel/sched/core.c:6961
>>  preempt_schedule_irq+0x51/0x90 kernel/sched/core.c:7288
>>  irqentry_exit+0x36/0x90 kernel/entry/common.c:197
>>  asm_sysvec_reschedule_ipi+0x1a/0x20 arch/x86/include/asm/idtentry.h:707
>> RIP: 0010:unwind_next_frame+0xfe7/0x20a0 arch/x86/kernel/unwind_orc.c:664
>> Code: 85 80 0c 00 00 49 89 6d 40 48 b8 00 00 00 00 00 fc ff df 4c 89 ea 48 c1 ea 03 0f b6 04 02 84 c0 74 08 3c 03 0f 8e 56 10 00 00 <41> 39 5d 00 0f 84 10 06 00 00 bd 01 00 00 00 e9 de f3 ff ff 48 b8
>> RSP: 0018:ffffc90003cdf6a8 EFLAGS: 00000246
>> RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffc90003ce0000
>> RDX: 1ffff9200079bee3 RSI: ffffc90003cdfa70 RDI: ffffc90003cdf758
>> RBP: ffffc90003cdfae0 R08: 0000000000000001 R09: 0000000000000000
>> R10: ffffc90003cdf718 R11: 00000000000121e6 R12: ffffc90003cdf768
>> R13: ffffc90003cdf718 R14: ffffc90003cdfa80 R15: ffffc90003cdf74c
>>  arch_stack_walk+0x94/0x100 arch/x86/kernel/stacktrace.c:25
>>  stack_trace_save+0x8e/0xc0 kernel/stacktrace.c:122
>>  kasan_save_stack+0x33/0x60 mm/kasan/common.c:47
>>  kasan_record_aux_stack+0xa7/0xc0 mm/kasan/generic.c:548
>>  slab_free_hook mm/slub.c:2378 [inline]
>>  slab_free mm/slub.c:4680 [inline]
>>  kmem_cache_free+0x15a/0x4d0 mm/slub.c:4782
>>  vms_complete_munmap_vmas+0x573/0x970 mm/vma.c:1293
>>  do_vmi_align_munmap+0x43b/0x7d0 mm/vma.c:1536
>>  do_vmi_munmap+0x204/0x3e0 mm/vma.c:1584
>>  __vm_munmap+0x19a/0x390 mm/vma.c:3155
>>  __do_sys_munmap mm/mmap.c:1080 [inline]
>>  __se_sys_munmap mm/mmap.c:1077 [inline]
>>  __x64_sys_munmap+0x59/0x80 mm/mmap.c:1077
>>  do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
>>  do_syscall_64+0xcd/0x4c0 arch/x86/entry/syscall_64.c:94
>>  entry_SYSCALL_64_after_hwframe+0x77/0x7f
>> RIP: 0033:0x7fb13ec2f2e7
>> RSP: 002b:00007fffe10fae78 EFLAGS: 00000206 ORIG_RAX: 000000000000000b
>> RAX: ffffffffffffffda RBX: 0000562bd1443510 RCX: 00007fb13ec2f2e7
>> RDX: 0000000000000002 RSI: 0000000000004028 RDI: 00007fb13ea1b000
>> RBP: 00007fffe10faf80 R08: 0000562bd1432470 R09: 0000000000000001
>> R10: 00007fffe10fadb0 R11: 0000000000000206 R12: 00007fffe10faea0
>> R13: 00007fb13ec42000 R14: 0000562bd1443510 R15: 0000000000000000
>>  </TASK>
>> task:syz-executor    state:R  running task     stack:27632 pid:6031  tgid:6031  ppid:5870   task_flags:0x400000 flags:0x00004000
>> Call Trace:
>>  <TASK>
>>  context_switch kernel/sched/core.c:5357 [inline]
>>  __schedule+0x1190/0x5de0 kernel/sched/core.c:6961
>>  preempt_schedule_common+0x44/0xc0 kernel/sched/core.c:7145
>>  preempt_schedule_thunk+0x16/0x30 arch/x86/entry/thunk.S:12
>>  __raw_spin_unlock include/linux/spinlock_api_smp.h:143 [inline]
>>  _raw_spin_unlock+0x3e/0x50 kernel/locking/spinlock.c:186
>>  spin_unlock include/linux/spinlock.h:391 [inline]
>>  filemap_map_pages+0xe15/0x1670 mm/filemap.c:3791
>>  do_fault_around mm/memory.c:5531 [inline]
>>  do_read_fault mm/memory.c:5564 [inline]
>>  do_fault mm/memory.c:5707 [inline]
>>  do_pte_missing+0xe39/0x3ba0 mm/memory.c:4234
>>  handle_pte_fault mm/memory.c:6052 [inline]
>>  __handle_mm_fault+0x152a/0x2a50 mm/memory.c:6195
>>  handle_mm_fault+0x589/0xd10 mm/memory.c:6364
>>  do_user_addr_fault+0x60c/0x1370 arch/x86/mm/fault.c:1336
>>  handle_page_fault arch/x86/mm/fault.c:1476 [inline]
>>  exc_page_fault+0x5c/0xb0 arch/x86/mm/fault.c:1532
>>  asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:623
>> RIP: 0033:0x7f54cd7177c7
>> RSP: 002b:00007fffb79a5b40 EFLAGS: 00010246
>> RAX: 00007f54ce525000 RBX: 0000000000000000 RCX: 0000000000000064
>> RDX: 00007fffb79a5de9 RSI: 0000000000000002 RDI: 00007fffb79a5dd8
>> RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
>> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000002
>> R13: 00007fffb79a5c48 R14: 0000000000000000 R15: 0000000000000000
>>  </TASK>
>> task:kworker/0:3     state:R  running task     stack:25368 pid:1208  tgid:1208  ppid:2      task_flags:0x4208060 flags:0x00004000
>> Workqueue: events_power_efficient gc_worker
>> Call Trace:
>>  <TASK>
>>  context_switch kernel/sched/core.c:5357 [inline]
>>  __schedule+0x1190/0x5de0 kernel/sched/core.c:6961
>>  preempt_schedule_irq+0x51/0x90 kernel/sched/core.c:7288
>>  irqentry_exit+0x36/0x90 kernel/entry/common.c:197
>>  asm_sysvec_reschedule_ipi+0x1a/0x20 arch/x86/include/asm/idtentry.h:707
>> RIP: 0010:write_comp_data+0x0/0x90 kernel/kcov.c:240
>> Code: 48 8b 05 db b4 1a 12 48 8b 80 30 16 00 00 e9 97 05 db 09 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 <49> 89 d2 49 89 f8 49 89 f1 65 48 8b 15 a7 b4 1a 12 65 8b 05 b8 b4
>> RSP: 0018:ffffc9000441fb50 EFLAGS: 00000293
>> RAX: 0000000000000000 RBX: 0000000000040000 RCX: ffffffff89ba2a52
>> RDX: 0000000000040000 RSI: 0000000000000433 RDI: 0000000000000004
>> RBP: ffffffff9b2c41ec R08: 0000000000000004 R09: 0000000000000000
>> R10: 0000000000000000 R11: ffffffff9b030610 R12: ffff888031800000
>> R13: 0000000000000433 R14: dffffc0000000000 R15: 0000000000001770
>>  gc_worker+0x342/0x16e0 net/netfilter/nf_conntrack_core.c:1549
>>  process_one_work+0x9cc/0x1b70 kernel/workqueue.c:3236
>>  process_scheduled_works kernel/workqueue.c:3319 [inline]
>>  worker_thread+0x6c8/0xf10 kernel/workqueue.c:3400
>>  kthread+0x3c5/0x780 kernel/kthread.c:463
>>  ret_from_fork+0x5d7/0x6f0 arch/x86/kernel/process.c:148
>>  ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
>>  </TASK>
>> task:dhcpcd          state:R  running task     stack:26072 pid:6029  tgid:6029  ppid:5513   task_flags:0x400040 flags:0x00004002
>> Call Trace:
>>  <TASK>
>>  context_switch kernel/sched/core.c:5357 [inline]
>>  __schedule+0x1190/0x5de0 kernel/sched/core.c:6961
>>  preempt_schedule_irq+0x51/0x90 kernel/sched/core.c:7288
>>  irqentry_exit+0x36/0x90 kernel/entry/common.c:197
>>  asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:702
>> RIP: 0010:orc_ip arch/x86/kernel/unwind_orc.c:80 [inline]
>> RIP: 0010:__orc_find+0x7e/0xf0 arch/x86/kernel/unwind_orc.c:102
>> Code: ea 3f 48 c1 fe 02 48 01 f2 48 d1 fa 48 8d 5c 95 00 48 89 da 48 c1 ea 03 0f b6 34 0a 48 89 da 83 e2 07 83 c2 03 40 38 f2 7c 05 <40> 84 f6 75 4b 48 63 13 48 01 da 49 39 d5 73 af 4c 8d 63 fc 49 39
>> RSP: 0018:ffffc90003337648 EFLAGS: 00000202
>> RAX: ffffffff914e0dd8 RBX: ffffffff90c5215c RCX: dffffc0000000000
>> RDX: 0000000000000007 RSI: 0000000000000000 RDI: ffffffff90c52148
>> RBP: ffffffff90c52148 R08: ffffffff914e0e1a R09: 0000000000000000
>> R10: ffffc900033376f8 R11: 0000000000011271 R12: ffffffff90c52170
>> R13: ffffffff82127173 R14: ffffffff90c52148 R15: ffffffff90c52148
>>  orc_find arch/x86/kernel/unwind_orc.c:227 [inline]
>>  unwind_next_frame+0x2ec/0x20a0 arch/x86/kernel/unwind_orc.c:494
>>  arch_stack_walk+0x94/0x100 arch/x86/kernel/stacktrace.c:25
>>  stack_trace_save+0x8e/0xc0 kernel/stacktrace.c:122
>>  kasan_save_stack+0x33/0x60 mm/kasan/common.c:47
>>  kasan_save_track+0x14/0x30 mm/kasan/common.c:68
>>  poison_kmalloc_redzone mm/kasan/common.c:388 [inline]
>>  __kasan_kmalloc+0xaa/0xb0 mm/kasan/common.c:405
>>  kmalloc_noprof include/linux/slab.h:905 [inline]
>>  slab_free_hook mm/slub.c:2369 [inline]
>>  slab_free mm/slub.c:4680 [inline]
>>  kmem_cache_free+0x142/0x4d0 mm/slub.c:4782
>>  vms_complete_munmap_vmas+0x573/0x970 mm/vma.c:1293
>>  do_vmi_align_munmap+0x43b/0x7d0 mm/vma.c:1536
>>  do_vmi_munmap+0x204/0x3e0 mm/vma.c:1584
>>  __vm_munmap+0x19a/0x390 mm/vma.c:3155
>>  __do_sys_munmap mm/mmap.c:1080 [inline]
>>  __se_sys_munmap mm/mmap.c:1077 [inline]
>>  __x64_sys_munmap+0x59/0x80 mm/mmap.c:1077
>>  do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
>>  do_syscall_64+0xcd/0x4c0 arch/x86/entry/syscall_64.c:94
>>  entry_SYSCALL_64_after_hwframe+0x77/0x7f
>> RIP: 0033:0x7fb13ec2f2e7
>> RSP: 002b:00007fffe10fae78 EFLAGS: 00000202 ORIG_RAX: 000000000000000b
>> RAX: ffffffffffffffda RBX: 0000562bd1443f00 RCX: 00007fb13ec2f2e7
>> RDX: 0000000000000001 RSI: 000000000002f6d0 RDI: 00007fb13e9c1000
>> RBP: 00007fffe10faf80 R08: 00000000000004f0 R09: 0000000000000002
>> R10: 00007fffe10fadb0 R11: 0000000000000202 R12: 00007fffe10faec0
>> R13: 00007fb13ec42000 R14: 0000562bd1443f00 R15: 0000000000000000
>>  </TASK>
>> 
>> 
>> ---
>> This report is generated by a bot. It may contain errors.
>> See https://goo.gl/tpsmEJ for more information about syzbot.
>> syzbot engineers can be reached at syzkaller@googlegroups.com.
>> 
>> syzbot will keep track of this issue. See:
>> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
>> 
>> If the report is already addressed, let syzbot know by replying with:
>> #syz fix: exact-commit-title
>> 
>> If you want syzbot to run the reproducer, reply with:
>> #syz test: git://repo/address.git branch-or-commit-hash
>> If you attach or paste a git patch, syzbot will apply it before testing.
>> 
>> If you want to overwrite report's subsystems, reply with:
>> #syz set subsystems: new-subsystem
>> (See the list of subsystem names on the web dashboard)
>> 
>> If the report is a duplicate of another one, reply with:
>> #syz dup: exact-subject-of-another-report
>> 
>> If you want to undo deduplication, reply with:
>> #syz undup
>
> Let's see if speeding up the debug helps.
>
> #syz test:

"---" does not look like a valid git repo address.

>
> --- a/mm/vma.c
> +++ b/mm/vma.c
> @@ -648,6 +648,7 @@ void validate_mm(struct mm_struct *mm)
>         struct vm_area_struct *vma;
>         VMA_ITERATOR(vmi, mm, 0);
>  
> +       return;
>         mt_validate(&mm->mm_mt);
>         for_each_vma(vmi, vma) {
>  #ifdef CONFIG_DEBUG_VM_RB
>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [syzbot] [mm?] INFO: rcu detected stall in sys_munmap (2)
  2025-08-22  4:15 [syzbot] [mm?] INFO: rcu detected stall in sys_munmap (2) syzbot
  2025-08-22 12:08 ` Lorenzo Stoakes
  2025-08-28  2:05 ` Liam R. Howlett
@ 2025-08-28  2:20 ` Liam R. Howlett
  2025-08-28  3:08   ` syzbot
  2025-12-19 19:37 ` syzbot
  3 siblings, 1 reply; 11+ messages in thread
From: Liam R. Howlett @ 2025-08-28  2:20 UTC (permalink / raw)
  To: syzbot
  Cc: akpm, jannh, linux-kernel, linux-mm, lorenzo.stoakes, pfalcato,
	syzkaller-bugs, vbabka

* syzbot <syzbot+8785aaf121cfb2141e0d@syzkaller.appspotmail.com> [250822 00:15]:
> Hello,
> 
> syzbot found the following issue on:
> 
> HEAD commit:    be48bcf004f9 Merge tag 'for-6.17-rc2-tag' of git://git.ker..
> git tree:       upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=136dfba2580000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=142508fb116c212f
> dashboard link: https://syzkaller.appspot.com/bug?extid=8785aaf121cfb2141e0d
> compiler:       gcc (Debian 12.2.0-14+deb12u1) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=107a43bc580000
> 
> Downloadable assets:
> disk image: https://storage.googleapis.com/syzbot-assets/24fd400c6842/disk-be48bcf0.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/59146305635d/vmlinux-be48bcf0.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/b3e5f65cbcc8/bzImage-be48bcf0.xz
> 
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+8785aaf121cfb2141e0d@syzkaller.appspotmail.com

Apparently I have no idea how to do this.. let's try again.

v6.17-rc2 + skipping validate_mm().

#syz test git://git.infradead.org/users/jedix/linux-maple.git no_validate

> 
> rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> rcu: 	Tasks blocked on level-0 rcu_node (CPUs 0-1): P6029/1:b..l P1208/1:b..l P6031/3:b..l P6030/1:b..l
> rcu: 	(detected by 1, t=10502 jiffies, g=6285, q=421 ncpus=2)
> task:dhcpcd          state:R  running task     stack:28896 pid:6030  tgid:6030  ppid:5513   task_flags:0x400040 flags:0x00004002
> Call Trace:
>  <TASK>
>  context_switch kernel/sched/core.c:5357 [inline]
>  __schedule+0x1190/0x5de0 kernel/sched/core.c:6961
>  preempt_schedule_irq+0x51/0x90 kernel/sched/core.c:7288
>  irqentry_exit+0x36/0x90 kernel/entry/common.c:197
>  asm_sysvec_reschedule_ipi+0x1a/0x20 arch/x86/include/asm/idtentry.h:707
> RIP: 0010:unwind_next_frame+0xfe7/0x20a0 arch/x86/kernel/unwind_orc.c:664
> Code: 85 80 0c 00 00 49 89 6d 40 48 b8 00 00 00 00 00 fc ff df 4c 89 ea 48 c1 ea 03 0f b6 04 02 84 c0 74 08 3c 03 0f 8e 56 10 00 00 <41> 39 5d 00 0f 84 10 06 00 00 bd 01 00 00 00 e9 de f3 ff ff 48 b8
> RSP: 0018:ffffc90003cdf6a8 EFLAGS: 00000246
> RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffc90003ce0000
> RDX: 1ffff9200079bee3 RSI: ffffc90003cdfa70 RDI: ffffc90003cdf758
> RBP: ffffc90003cdfae0 R08: 0000000000000001 R09: 0000000000000000
> R10: ffffc90003cdf718 R11: 00000000000121e6 R12: ffffc90003cdf768
> R13: ffffc90003cdf718 R14: ffffc90003cdfa80 R15: ffffc90003cdf74c
>  arch_stack_walk+0x94/0x100 arch/x86/kernel/stacktrace.c:25
>  stack_trace_save+0x8e/0xc0 kernel/stacktrace.c:122
>  kasan_save_stack+0x33/0x60 mm/kasan/common.c:47
>  kasan_record_aux_stack+0xa7/0xc0 mm/kasan/generic.c:548
>  slab_free_hook mm/slub.c:2378 [inline]
>  slab_free mm/slub.c:4680 [inline]
>  kmem_cache_free+0x15a/0x4d0 mm/slub.c:4782
>  vms_complete_munmap_vmas+0x573/0x970 mm/vma.c:1293
>  do_vmi_align_munmap+0x43b/0x7d0 mm/vma.c:1536
>  do_vmi_munmap+0x204/0x3e0 mm/vma.c:1584
>  __vm_munmap+0x19a/0x390 mm/vma.c:3155
>  __do_sys_munmap mm/mmap.c:1080 [inline]
>  __se_sys_munmap mm/mmap.c:1077 [inline]
>  __x64_sys_munmap+0x59/0x80 mm/mmap.c:1077
>  do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
>  do_syscall_64+0xcd/0x4c0 arch/x86/entry/syscall_64.c:94
>  entry_SYSCALL_64_after_hwframe+0x77/0x7f
> RIP: 0033:0x7fb13ec2f2e7
> RSP: 002b:00007fffe10fae78 EFLAGS: 00000206 ORIG_RAX: 000000000000000b
> RAX: ffffffffffffffda RBX: 0000562bd1443510 RCX: 00007fb13ec2f2e7
> RDX: 0000000000000002 RSI: 0000000000004028 RDI: 00007fb13ea1b000
> RBP: 00007fffe10faf80 R08: 0000562bd1432470 R09: 0000000000000001
> R10: 00007fffe10fadb0 R11: 0000000000000206 R12: 00007fffe10faea0
> R13: 00007fb13ec42000 R14: 0000562bd1443510 R15: 0000000000000000
>  </TASK>
> task:syz-executor    state:R  running task     stack:27632 pid:6031  tgid:6031  ppid:5870   task_flags:0x400000 flags:0x00004000
> Call Trace:
>  <TASK>
>  context_switch kernel/sched/core.c:5357 [inline]
>  __schedule+0x1190/0x5de0 kernel/sched/core.c:6961
>  preempt_schedule_common+0x44/0xc0 kernel/sched/core.c:7145
>  preempt_schedule_thunk+0x16/0x30 arch/x86/entry/thunk.S:12
>  __raw_spin_unlock include/linux/spinlock_api_smp.h:143 [inline]
>  _raw_spin_unlock+0x3e/0x50 kernel/locking/spinlock.c:186
>  spin_unlock include/linux/spinlock.h:391 [inline]
>  filemap_map_pages+0xe15/0x1670 mm/filemap.c:3791
>  do_fault_around mm/memory.c:5531 [inline]
>  do_read_fault mm/memory.c:5564 [inline]
>  do_fault mm/memory.c:5707 [inline]
>  do_pte_missing+0xe39/0x3ba0 mm/memory.c:4234
>  handle_pte_fault mm/memory.c:6052 [inline]
>  __handle_mm_fault+0x152a/0x2a50 mm/memory.c:6195
>  handle_mm_fault+0x589/0xd10 mm/memory.c:6364
>  do_user_addr_fault+0x60c/0x1370 arch/x86/mm/fault.c:1336
>  handle_page_fault arch/x86/mm/fault.c:1476 [inline]
>  exc_page_fault+0x5c/0xb0 arch/x86/mm/fault.c:1532
>  asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:623
> RIP: 0033:0x7f54cd7177c7
> RSP: 002b:00007fffb79a5b40 EFLAGS: 00010246
> RAX: 00007f54ce525000 RBX: 0000000000000000 RCX: 0000000000000064
> RDX: 00007fffb79a5de9 RSI: 0000000000000002 RDI: 00007fffb79a5dd8
> RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000002
> R13: 00007fffb79a5c48 R14: 0000000000000000 R15: 0000000000000000
>  </TASK>
> task:kworker/0:3     state:R  running task     stack:25368 pid:1208  tgid:1208  ppid:2      task_flags:0x4208060 flags:0x00004000
> Workqueue: events_power_efficient gc_worker
> Call Trace:
>  <TASK>
>  context_switch kernel/sched/core.c:5357 [inline]
>  __schedule+0x1190/0x5de0 kernel/sched/core.c:6961
>  preempt_schedule_irq+0x51/0x90 kernel/sched/core.c:7288
>  irqentry_exit+0x36/0x90 kernel/entry/common.c:197
>  asm_sysvec_reschedule_ipi+0x1a/0x20 arch/x86/include/asm/idtentry.h:707
> RIP: 0010:write_comp_data+0x0/0x90 kernel/kcov.c:240
> Code: 48 8b 05 db b4 1a 12 48 8b 80 30 16 00 00 e9 97 05 db 09 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 <49> 89 d2 49 89 f8 49 89 f1 65 48 8b 15 a7 b4 1a 12 65 8b 05 b8 b4
> RSP: 0018:ffffc9000441fb50 EFLAGS: 00000293
> RAX: 0000000000000000 RBX: 0000000000040000 RCX: ffffffff89ba2a52
> RDX: 0000000000040000 RSI: 0000000000000433 RDI: 0000000000000004
> RBP: ffffffff9b2c41ec R08: 0000000000000004 R09: 0000000000000000
> R10: 0000000000000000 R11: ffffffff9b030610 R12: ffff888031800000
> R13: 0000000000000433 R14: dffffc0000000000 R15: 0000000000001770
>  gc_worker+0x342/0x16e0 net/netfilter/nf_conntrack_core.c:1549
>  process_one_work+0x9cc/0x1b70 kernel/workqueue.c:3236
>  process_scheduled_works kernel/workqueue.c:3319 [inline]
>  worker_thread+0x6c8/0xf10 kernel/workqueue.c:3400
>  kthread+0x3c5/0x780 kernel/kthread.c:463
>  ret_from_fork+0x5d7/0x6f0 arch/x86/kernel/process.c:148
>  ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
>  </TASK>
> task:dhcpcd          state:R  running task     stack:26072 pid:6029  tgid:6029  ppid:5513   task_flags:0x400040 flags:0x00004002
> Call Trace:
>  <TASK>
>  context_switch kernel/sched/core.c:5357 [inline]
>  __schedule+0x1190/0x5de0 kernel/sched/core.c:6961
>  preempt_schedule_irq+0x51/0x90 kernel/sched/core.c:7288
>  irqentry_exit+0x36/0x90 kernel/entry/common.c:197
>  asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:702
> RIP: 0010:orc_ip arch/x86/kernel/unwind_orc.c:80 [inline]
> RIP: 0010:__orc_find+0x7e/0xf0 arch/x86/kernel/unwind_orc.c:102
> Code: ea 3f 48 c1 fe 02 48 01 f2 48 d1 fa 48 8d 5c 95 00 48 89 da 48 c1 ea 03 0f b6 34 0a 48 89 da 83 e2 07 83 c2 03 40 38 f2 7c 05 <40> 84 f6 75 4b 48 63 13 48 01 da 49 39 d5 73 af 4c 8d 63 fc 49 39
> RSP: 0018:ffffc90003337648 EFLAGS: 00000202
> RAX: ffffffff914e0dd8 RBX: ffffffff90c5215c RCX: dffffc0000000000
> RDX: 0000000000000007 RSI: 0000000000000000 RDI: ffffffff90c52148
> RBP: ffffffff90c52148 R08: ffffffff914e0e1a R09: 0000000000000000
> R10: ffffc900033376f8 R11: 0000000000011271 R12: ffffffff90c52170
> R13: ffffffff82127173 R14: ffffffff90c52148 R15: ffffffff90c52148
>  orc_find arch/x86/kernel/unwind_orc.c:227 [inline]
>  unwind_next_frame+0x2ec/0x20a0 arch/x86/kernel/unwind_orc.c:494
>  arch_stack_walk+0x94/0x100 arch/x86/kernel/stacktrace.c:25
>  stack_trace_save+0x8e/0xc0 kernel/stacktrace.c:122
>  kasan_save_stack+0x33/0x60 mm/kasan/common.c:47
>  kasan_save_track+0x14/0x30 mm/kasan/common.c:68
>  poison_kmalloc_redzone mm/kasan/common.c:388 [inline]
>  __kasan_kmalloc+0xaa/0xb0 mm/kasan/common.c:405
>  kmalloc_noprof include/linux/slab.h:905 [inline]
>  slab_free_hook mm/slub.c:2369 [inline]
>  slab_free mm/slub.c:4680 [inline]
>  kmem_cache_free+0x142/0x4d0 mm/slub.c:4782
>  vms_complete_munmap_vmas+0x573/0x970 mm/vma.c:1293
>  do_vmi_align_munmap+0x43b/0x7d0 mm/vma.c:1536
>  do_vmi_munmap+0x204/0x3e0 mm/vma.c:1584
>  __vm_munmap+0x19a/0x390 mm/vma.c:3155
>  __do_sys_munmap mm/mmap.c:1080 [inline]
>  __se_sys_munmap mm/mmap.c:1077 [inline]
>  __x64_sys_munmap+0x59/0x80 mm/mmap.c:1077
>  do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
>  do_syscall_64+0xcd/0x4c0 arch/x86/entry/syscall_64.c:94
>  entry_SYSCALL_64_after_hwframe+0x77/0x7f
> RIP: 0033:0x7fb13ec2f2e7
> RSP: 002b:00007fffe10fae78 EFLAGS: 00000202 ORIG_RAX: 000000000000000b
> RAX: ffffffffffffffda RBX: 0000562bd1443f00 RCX: 00007fb13ec2f2e7
> RDX: 0000000000000001 RSI: 000000000002f6d0 RDI: 00007fb13e9c1000
> RBP: 00007fffe10faf80 R08: 00000000000004f0 R09: 0000000000000002
> R10: 00007fffe10fadb0 R11: 0000000000000202 R12: 00007fffe10faec0
> R13: 00007fb13ec42000 R14: 0000562bd1443f00 R15: 0000000000000000
>  </TASK>
> 
> 
> ---
> This report is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkaller@googlegroups.com.
> 
> syzbot will keep track of this issue. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
> 
> If the report is already addressed, let syzbot know by replying with:
> #syz fix: exact-commit-title
> 
> If you want syzbot to run the reproducer, reply with:
> #syz test: git://repo/address.git branch-or-commit-hash
> If you attach or paste a git patch, syzbot will apply it before testing.
> 
> If you want to overwrite report's subsystems, reply with:
> #syz set subsystems: new-subsystem
> (See the list of subsystem names on the web dashboard)
> 
> If the report is a duplicate of another one, reply with:
> #syz dup: exact-subject-of-another-report
> 
> If you want to undo deduplication, reply with:
> #syz undup


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [syzbot] [mm?] INFO: rcu detected stall in sys_munmap (2)
  2025-08-28  2:20 ` Liam R. Howlett
@ 2025-08-28  3:08   ` syzbot
  0 siblings, 0 replies; 11+ messages in thread
From: syzbot @ 2025-08-28  3:08 UTC (permalink / raw)
  To: akpm, jannh, liam.howlett, linux-kernel, linux-mm,
	lorenzo.stoakes, pfalcato, syzkaller-bugs, vbabka

Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
INFO: rcu detected stall in corrupted

rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
rcu: 	Tasks blocked on level-0 rcu_node (CPUs 0-1): P5218/1:b..l
rcu: 	(detected by 0, t=10502 jiffies, g=10417, q=327 ncpus=2)
task:udevd           state:R  running task     stack:26640 pid:5218  tgid:5218  ppid:1      task_flags:0x400140 flags:0x00004002
Call Trace:
 <TASK>
 context_switch kernel/sched/core.c:5357 [inline]
 __schedule+0x1190/0x5de0 kernel/sched/core.c:6961
 preempt_schedule_irq+0x51/0x90 kernel/sched/core.c:7288
 irqentry_exit+0x36/0x90 kernel/entry/common.c:197
 asm_sysvec_reschedule_ipi+0x1a/0x20 arch/x86/include/asm/idtentry.h:707
RIP: 0010:lock_acquire+0x30/0x350 kernel/locking/lockdep.c:5828
Code: 4d 89 cf 41 56 41 89 f6 41 55 41 89 d5 41 54 45 89 c4 55 89 cd 53 48 89 fb 48 83 ec 38 65 48 8b 05 0d 79 3e 12 48 89 44 24 30 <31> c0 66 90 65 8b 05 29 79 3e 12 83 f8 07 0f 87 bc 02 00 00 89 c0
RSP: 0018:ffffc90003d0f530 EFLAGS: 00000286
RAX: 4b548df46ee33600 RBX: ffffffff8e5c11e0 RCX: 0000000000000002
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff8e5c11e0
RBP: 0000000000000002 R08: 0000000000000000 R09: 0000000000000000
R10: ffffc90003d0f618 R11: 00000000000135a3 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
 rcu_lock_acquire include/linux/rcupdate.h:331 [inline]
 rcu_read_lock include/linux/rcupdate.h:841 [inline]
 class_rcu_constructor include/linux/rcupdate.h:1155 [inline]
 unwind_next_frame+0xd1/0x20a0 arch/x86/kernel/unwind_orc.c:479
 arch_stack_walk+0x94/0x100 arch/x86/kernel/stacktrace.c:25
 stack_trace_save+0x8e/0xc0 kernel/stacktrace.c:122
 kasan_save_stack+0x33/0x60 mm/kasan/common.c:47
 kasan_save_track+0x14/0x30 mm/kasan/common.c:68
 kasan_save_free_info+0x3b/0x60 mm/kasan/generic.c:576
 poison_slab_object mm/kasan/common.c:243 [inline]
 __kasan_slab_free+0x60/0x70 mm/kasan/common.c:275
 kasan_slab_free include/linux/kasan.h:233 [inline]
 slab_free_hook mm/slub.c:2417 [inline]
 slab_free mm/slub.c:4680 [inline]
 kfree+0x2b4/0x4d0 mm/slub.c:4879
 tomoyo_realpath_from_path+0x19f/0x6e0 security/tomoyo/realpath.c:286
 tomoyo_get_realpath security/tomoyo/file.c:151 [inline]
 tomoyo_path_perm+0x274/0x460 security/tomoyo/file.c:822
 security_inode_getattr+0x116/0x290 security/security.c:2377
 vfs_getattr fs/stat.c:259 [inline]
 vfs_statx_path fs/stat.c:299 [inline]
 vfs_statx+0x121/0x3f0 fs/stat.c:356
 vfs_fstatat+0x7b/0xf0 fs/stat.c:375
 __do_sys_newfstatat+0x97/0x120 fs/stat.c:542
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xcd/0x4c0 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f47f6d11b0a
RSP: 002b:00007ffd84c35818 EFLAGS: 00000246 ORIG_RAX: 0000000000000106
RAX: ffffffffffffffda RBX: 0000559b20c5e418 RCX: 00007f47f6d11b0a
RDX: 00007ffd84c35820 RSI: 0000559b20c4cef3 RDI: 00000000ffffff9c
RBP: 0000559b5aa6d668 R08: 00063d641a57c867 R09: 00007f47f7457000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
R13: 00007ffd84c35820 R14: 0000000000000000 R15: 00063d641a57c867
 </TASK>
rcu: rcu_preempt kthread starved for 966 jiffies! g10417 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=1
rcu: 	Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
rcu: RCU grace-period kthread stack dump:
task:rcu_preempt     state:R  running task     stack:28936 pid:16    tgid:16    ppid:2      task_flags:0x208040 flags:0x00004000
Call Trace:
 <TASK>
 context_switch kernel/sched/core.c:5357 [inline]
 __schedule+0x1190/0x5de0 kernel/sched/core.c:6961
 __schedule_loop kernel/sched/core.c:7043 [inline]
 schedule+0xe7/0x3a0 kernel/sched/core.c:7058
 schedule_timeout+0x123/0x290 kernel/time/sleep_timeout.c:99
 rcu_gp_fqs_loop+0x1ea/0xb00 kernel/rcu/tree.c:2083
 rcu_gp_kthread+0x270/0x380 kernel/rcu/tree.c:2285
 kthread+0x3c2/0x780 kernel/kthread.c:463
 ret_from_fork+0x5d7/0x6f0 arch/x86/kernel/process.c:148
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
 </TASK>
rcu: Stack dump where RCU GP kthread last ran:
Sending NMI from CPU 0 to CPUs 1:
NMI backtrace for cpu 1
CPU: 1 UID: 0 PID: 6556 Comm: syz.1.23 Not tainted syzkaller #0 PREEMPT(full) 
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/12/2025
RIP: 0010:arch_static_branch arch/x86/include/asm/jump_label.h:36 [inline]
RIP: 0010:native_write_msr arch/x86/include/asm/msr.h:139 [inline]
RIP: 0010:wrmsrq arch/x86/include/asm/msr.h:199 [inline]
RIP: 0010:native_apic_msr_write arch/x86/include/asm/apic.h:212 [inline]
RIP: 0010:native_apic_msr_write+0x28/0x40 arch/x86/include/asm/apic.h:206
Code: 90 90 f3 0f 1e fa 8d 87 30 ff ff ff 83 e0 ef 74 20 89 f8 83 e0 ef 83 f8 20 74 16 c1 ef 04 31 d2 89 f0 8d 8f 00 08 00 00 0f 30 <66> 90 c3 cc cc cc cc c3 cc cc cc cc 89 f6 31 d2 89 cf e9 b1 4d ae
RSP: 0018:ffffc900031479f0 EFLAGS: 00000046
RAX: 000000000000003e RBX: ffff8880b8523a00 RCX: 0000000000000838
RDX: 0000000000000000 RSI: 000000000000003e RDI: 0000000000000038
RBP: 000000000000003e R08: 0000000000000005 R09: 000000000000003f
R10: 0000000000000020 R11: ffffffff9b0d2580 R12: dffffc0000000000
R13: 0000000000000000 R14: 0000000000000020 R15: ffffed10170a4745
FS:  0000555558775500(0000) GS:ffff8881247bc000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000c00732e000 CR3: 0000000077210000 CR4: 00000000003526f0
Call Trace:
 <TASK>
 apic_write arch/x86/include/asm/apic.h:405 [inline]
 lapic_next_event+0x10/0x20 arch/x86/kernel/apic/apic.c:416
 clockevents_program_min_delta+0x173/0x3a0 kernel/time/clockevents.c:248
 clockevents_program_event+0x2a6/0x380 kernel/time/clockevents.c:336
 tick_program_event+0xa9/0x140 kernel/time/tick-oneshot.c:44
 __hrtimer_reprogram kernel/time/hrtimer.c:685 [inline]
 __hrtimer_reprogram kernel/time/hrtimer.c:659 [inline]
 hrtimer_reprogram+0x27b/0x450 kernel/time/hrtimer.c:868
 hrtimer_start_range_ns+0x9d4/0xfc0 kernel/time/hrtimer.c:1330
 __posixtimer_deliver_signal kernel/time/posix-timers.c:322 [inline]
 posixtimer_deliver_signal+0x30d/0x6b0 kernel/time/posix-timers.c:348
 dequeue_signal+0x307/0x520 kernel/signal.c:660
 get_signal+0x602/0x26d0 kernel/signal.c:2914
 arch_do_signal_or_restart+0x8f/0x7d0 arch/x86/kernel/signal.c:337
 exit_to_user_mode_loop+0x84/0x110 kernel/entry/common.c:40
 exit_to_user_mode_prepare include/linux/irq-entry-common.h:225 [inline]
 syscall_exit_to_user_mode_work include/linux/entry-common.h:175 [inline]
 syscall_exit_to_user_mode include/linux/entry-common.h:210 [inline]
 do_syscall_64+0x3f6/0x4c0 arch/x86/entry/syscall_64.c:100
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f5114d8ebe9
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffcc5e18098 EFLAGS: 00000246
RAX: fffffffffffffffc RBX: 00000000000231ee RCX: 00007f5114d8ebe9
RDX: 0000000000000000 RSI: 0000000000000080 RDI: 00007f5114fb5fac
RBP: 0000000000000032 R08: 00007f5115bce000 R09: 00000012c5e1838f
R10: 00007ffcc5e18190 R11: 0000000000000246 R12: 00007f5114fb5fac
R13: 00007ffcc5e18190 R14: 0000000000023220 R15: 00007ffcc5e181b0
 </TASK>


Tested on:

commit:         a1617343 skip the validate_mm() for stall test
git tree:       git://git.infradead.org/users/jedix/linux-maple.git no_validate
console output: https://syzkaller.appspot.com/x/log.txt?x=13441fbc580000
kernel config:  https://syzkaller.appspot.com/x/.config?x=d4703ac89d9e185a
dashboard link: https://syzkaller.appspot.com/bug?extid=8785aaf121cfb2141e0d
compiler:       gcc (Debian 12.2.0-14+deb12u1) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40

Note: no patches were applied.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [syzbot] [mm?] INFO: rcu detected stall in sys_munmap (2)
  2025-08-28  1:57       ` Liam R. Howlett
@ 2025-08-28  3:35         ` Liam R. Howlett
  0 siblings, 0 replies; 11+ messages in thread
From: Liam R. Howlett @ 2025-08-28  3:35 UTC (permalink / raw)
  To: Josh Poimboeuf, Harry Yoo, Lorenzo Stoakes, syzbot, akpm, jannh,
	linux-kernel, linux-mm, pfalcato, syzkaller-bugs, vbabka,
	Sebastian Andrzej Siewior, peterz

* Liam R. Howlett <Liam.Howlett@oracle.com> [250827 21:58]:
> * Josh Poimboeuf <jpoimboe@kernel.org> [250827 20:29]:
> > On Fri, Aug 22, 2025 at 10:55:10PM +0900, Harry Yoo wrote:
> > > On Fri, Aug 22, 2025 at 01:08:02PM +0100, Lorenzo Stoakes wrote:
> > > > +cc Sebastian for RCU ORC change...
> > > > 
> > > > +cc Harry for slab side.
> > > 
> > > +cc Josh and Peter for stack unwinding stuff.
> > > 
> > > > Pinging Jann for the CONFIG_SLUB_RCU_DEBUG element.
> > > > 
> > > > Jann - could this possibly be related to CONFIG_SLUB_RCU_DEBUG? As it seems to
> > > > the stack is within KASAN, but no KASAN report so maybe it's KASAN itself that's
> > > > having an issue?
> > > > 
> > > > Though I'm thinking maybe it's the orc unwinder itself that could be problematic
> > > > here (yet invoked by CONFIG_SLUB_RCU_DEBUG though)... and yeah kinda suspcious
> > > > because:
> > > > 
> > > > - We have two threads freeing VMAs using SLAB_TYPESAFE_BY_RCU
> > > > - CONFIG_SLUB_RCU_DEBUG means that we use KASAN to save an aux stack, which
> > > >   makes us do an unwind via ORC, which then takes an RCU read lock on
> > > >   unwind_next_frame(), and both are doing this unwinding at the time of report.
> > > > - ???
> > > > - Somehow things get locked up?
> > > > 
> > > > I'm not an RCU expert (clearly :) so I'm not sure exactly how this could result
> > > > in a stall, but it's suspicious.
> > > 
> > > Can this be because of misleading ORC data or logical error in ORC unwinder
> > > that makes it fall into an infinite loop (unwind_done() never returning
> > > true in arch_stack_walk())?
> > > 
> > > ...because the reported line number reported doesn't really make sense
> > > as a cause of stalls.
> > 
> > There shouldn't be any way for ORC to hit an infinite loop.  Worst case
> > it would stop after the caller's buffer fills up.  ORC has always been
> > solid, and the RCU usage looks fine to me.  I tend to doubt ORC is at
> > fault here.
> > 
> > Maybe some interaction higher up the stack is causing things to run in a
> > tight loop.
> > 
> > All those debugging options (e.g., DEBUG_VM_MAPLE_TREE, LOCKDEP, KASAN,
> > SLUB_RCU_DEBUG...) could be a factor in slowing things down to a crawl.
> 
> DEBUG_VM_MAPLE_TREE is super heavy, but that comes from validate_mm()
> which would be the last thing to happen before returning, usually.
> 
> I mean surely that would show up in the logs.
> 
> Okay it's in the second log on the dashboard..
> 
> Yeah, I think it's debug options eventually causing failure.  Apparently
> there's a reproducer for syz now but without the validate_mm().

I don't think it's the debugging options as removing the validate_mm()
did not help.

We may want to wait for a c reproducer.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [syzbot] [mm?] INFO: rcu detected stall in sys_munmap (2)
  2025-08-22  4:15 [syzbot] [mm?] INFO: rcu detected stall in sys_munmap (2) syzbot
                   ` (2 preceding siblings ...)
  2025-08-28  2:20 ` Liam R. Howlett
@ 2025-12-19 19:37 ` syzbot
  3 siblings, 0 replies; 11+ messages in thread
From: syzbot @ 2025-12-19 19:37 UTC (permalink / raw)
  To: Liam.Howlett, akpm, bigeasy, david, harry.yoo, jannh, jpoimboe,
	liam.howlett, linux-kernel, linux-mm, lorenzo.stoakes, peterz,
	pfalcato, riel, syzkaller-bugs, vbabka

syzbot has found a reproducer for the following issue on:

HEAD commit:    dd9b004b7ff3 Merge tag 'trace-v6.19-rc1' of git://git.kern..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=10010d1a580000
kernel config:  https://syzkaller.appspot.com/x/.config?x=a94030c847137a18
dashboard link: https://syzkaller.appspot.com/bug?extid=8785aaf121cfb2141e0d
compiler:       Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=1519cd58580000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=1438f9b4580000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/ea0a8b24838c/disk-dd9b004b.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/67ac69e3e131/vmlinux-dd9b004b.xz
kernel image: https://storage.googleapis.com/syzbot-assets/570521afa03d/bzImage-dd9b004b.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+8785aaf121cfb2141e0d@syzkaller.appspotmail.com

rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
rcu: 	Tasks blocked on level-0 rcu_node (CPUs 0-1): P6120/1:b..l
rcu: 	(detected by 0, t=10502 jiffies, g=13285, q=486 ncpus=2)
task:dhcpcd          state:R  running task     stack:25432 pid:6120  tgid:6120  ppid:5490   task_flags:0x400040 flags:0x00080000
Call Trace:
 <TASK>
 context_switch kernel/sched/core.c:5256 [inline]
 __schedule+0x14bc/0x5000 kernel/sched/core.c:6863
 preempt_schedule_irq+0xb5/0x150 kernel/sched/core.c:7190
 irqentry_exit+0x5d8/0x660 kernel/entry/common.c:216
 asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:697
RIP: 0010:lock_acquire+0x16c/0x340 kernel/locking/lockdep.c:5872
Code: 00 00 00 00 9c 8f 44 24 30 f7 44 24 30 00 02 00 00 0f 85 cd 00 00 00 f7 44 24 08 00 02 00 00 74 01 fb 65 48 8b 05 d4 38 e0 10 <48> 3b 44 24 58 0f 85 e5 00 00 00 48 83 c4 60 5b 41 5c 41 5d 41 5e
RSP: 0018:ffffc900031970d8 EFLAGS: 00000206
RAX: e8c1bbb3b5be3a00 RBX: 0000000000000000 RCX: e8c1bbb3b5be3a00
RDX: 000000005594689c RSI: ffffffff8d979233 RDI: ffffffff8bc08360
RBP: ffffffff81743f85 R08: ffffffff81743f85 R09: ffffffff8df41a20
R10: dffffc0000000000 R11: ffffffff81ada120 R12: 0000000000000002
R13: ffffffff8df41a20 R14: 0000000000000000 R15: 0000000000000246
 rcu_lock_acquire include/linux/rcupdate.h:331 [inline]
 rcu_read_lock include/linux/rcupdate.h:867 [inline]
 class_rcu_constructor include/linux/rcupdate.h:1195 [inline]
 unwind_next_frame+0xc2/0x23d0 arch/x86/kernel/unwind_orc.c:495
 arch_stack_walk+0x11c/0x150 arch/x86/kernel/stacktrace.c:25
 stack_trace_save+0x9c/0xe0 kernel/stacktrace.c:122
 kasan_save_stack mm/kasan/common.c:56 [inline]
 kasan_save_track+0x3e/0x80 mm/kasan/common.c:77
 poison_kmalloc_redzone mm/kasan/common.c:397 [inline]
 __kasan_kmalloc+0x93/0xb0 mm/kasan/common.c:414
 kasan_kmalloc include/linux/kasan.h:262 [inline]
 __kmalloc_cache_noprof+0x3e2/0x700 mm/slub.c:5776
 kmalloc_noprof include/linux/slab.h:957 [inline]
 slab_free_hook mm/slub.c:2492 [inline]
 slab_free mm/slub.c:6668 [inline]
 kmem_cache_free+0x16b/0x620 mm/slub.c:6779
 anon_vma_free mm/rmap.c:136 [inline]
 __put_anon_vma+0x12b/0x2d0 mm/rmap.c:2780
 put_anon_vma include/linux/rmap.h:117 [inline]
 unlink_anon_vmas+0x503/0x670 mm/rmap.c:443
 free_pgtables+0x735/0x9d0 mm/memory.c:414
 vms_clear_ptes+0x423/0x530 mm/vma.c:1236
 vms_complete_munmap_vmas+0x206/0x8a0 mm/vma.c:1280
 do_vmi_align_munmap+0x364/0x440 mm/vma.c:1539
 do_vmi_munmap+0x253/0x2e0 mm/vma.c:1587
 __vm_munmap+0x207/0x380 mm/vma.c:3203
 __do_sys_munmap mm/mmap.c:1077 [inline]
 __se_sys_munmap mm/mmap.c:1074 [inline]
 __x64_sys_munmap+0x60/0x70 mm/mmap.c:1074
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xfa/0xf80 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fd7d0dad2e7
RSP: 002b:00007ffd60bdc648 EFLAGS: 00000206 ORIG_RAX: 000000000000000b
RAX: ffffffffffffffda RBX: 000055f3a7cc03f0 RCX: 00007fd7d0dad2e7
RDX: 0000000000000002 RSI: 0000000000061018 RDI: 00007fd7d0add000
RBP: 00007ffd60bdc750 R08: 0000000000000030 R09: 000055f3a7cc0fa0
R10: 00007ffd60bdc580 R11: 0000000000000206 R12: 00007ffd60bdc698
R13: 00007fd7d0dc0000 R14: 000055f3a7cc03f0 R15: 0000000000000000
 </TASK>
rcu: rcu_preempt kthread starved for 10455 jiffies! g13285 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=1
rcu: 	Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
rcu: RCU grace-period kthread stack dump:
task:rcu_preempt     state:R  running task     stack:27736 pid:16    tgid:16    ppid:2      task_flags:0x208040 flags:0x00080000
Call Trace:
 <TASK>
 context_switch kernel/sched/core.c:5256 [inline]
 __schedule+0x14bc/0x5000 kernel/sched/core.c:6863
 __schedule_loop kernel/sched/core.c:6945 [inline]
 schedule+0x165/0x360 kernel/sched/core.c:6960
 schedule_timeout+0x12b/0x270 kernel/time/sleep_timeout.c:99
 rcu_gp_fqs_loop+0x301/0x1540 kernel/rcu/tree.c:2083
 rcu_gp_kthread+0x99/0x390 kernel/rcu/tree.c:2285
 kthread+0x711/0x8a0 kernel/kthread.c:463
 ret_from_fork+0x599/0xb30 arch/x86/kernel/process.c:158
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:246
 </TASK>
rcu: Stack dump where RCU GP kthread last ran:
Sending NMI from CPU 0 to CPUs 1:
NMI backtrace for cpu 1
CPU: 1 UID: 0 PID: 6122 Comm: vhost-6117 Not tainted syzkaller #0 PREEMPT(full) 
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/25/2025
RIP: 0010:do_user_addr_fault+0x1cd/0x1380 arch/x86/mm/fault.c:1293
Code: e8 b8 39 54 00 fb 4c 89 7c 24 38 0f 1f 44 00 00 e8 a8 62 4c 00 44 89 f6 83 e6 40 31 ff 48 89 74 24 30 e8 76 67 4c 00 44 89 f6 <83> e6 02 31 ff 48 89 74 24 40 e8 64 67 4c 00 45 31 ff 4c 89 f6 48
RSP: 0018:ffffc900031b7878 EFLAGS: 00000293
RAX: ffffffff81754fba RBX: 0000000000000200 RCX: ffff88802c92db80
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: dffffc0000000000 R08: ffffffff8f822077 R09: 1ffffffff1f0440e
R10: dffffc0000000000 R11: fffffbfff1f0440f R12: ffffc900031b7968
R13: 0000000000000002 R14: 0000000000000000 R15: ffff88802c92db80
FS:  000055555ad2e500(0000) GS:ffff888125f35000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000002 CR3: 00000000322fa000 CR4: 0000000000350ef0
Call Trace:
 <TASK>
 handle_page_fault arch/x86/mm/fault.c:1476 [inline]
 exc_page_fault+0x82/0x100 arch/x86/mm/fault.c:1532
 asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:618
RIP: 0010:__get_user_nocheck_2+0x6/0x20 arch/x86/lib/getuser.S:130
Code: 01 ca e9 c8 d6 b5 f5 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 01 cb 0f ae e8 <0f> b7 10 31 c0 0f 01 ca e9 98 d6 b5 f5 90 90 90 90 90 90 90 90 90
RSP: 0018:ffffc900031b7a10 EFLAGS: 00050202
RAX: 0000000000000002 RBX: ffff8880244f01e0 RCX: ffff88802c92db80
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8880244f01e0
RBP: 0000000000000000 R08: ffffffff8f822077 R09: 1ffffffff1f0440e
R10: dffffc0000000000 R11: fffffbfff1f0440f R12: dffffc0000000000
R13: dffffc0000000000 R14: ffff8880244f0290 R15: ffffc900031b7a18
 vhost_get_avail_idx+0xc4/0x450 drivers/vhost/vhost.c:1531
 vhost_enable_notify+0x303/0x650 drivers/vhost/vhost.c:3245
 vhost_transport_do_send_pkt+0xf55/0x1330 drivers/vhost/vsock.c:125
 vhost_run_work_list+0x14e/0x1e0 drivers/vhost/vhost.c:454
 vhost_task_fn+0x27c/0x430 kernel/vhost_task.c:49
 ret_from_fork+0x599/0xb30 arch/x86/kernel/process.c:158
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:246
 </TASK>


---
If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2025-12-19 19:37 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-08-22  4:15 [syzbot] [mm?] INFO: rcu detected stall in sys_munmap (2) syzbot
2025-08-22 12:08 ` Lorenzo Stoakes
2025-08-22 13:55   ` Harry Yoo
2025-08-28  0:29     ` Josh Poimboeuf
2025-08-28  1:57       ` Liam R. Howlett
2025-08-28  3:35         ` Liam R. Howlett
2025-08-28  2:05 ` Liam R. Howlett
2025-08-28  2:05   ` syzbot
2025-08-28  2:20 ` Liam R. Howlett
2025-08-28  3:08   ` syzbot
2025-12-19 19:37 ` syzbot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox