* [BUG] rcu detected stall in shmem_file_write_iter
@ 2026-02-05 12:57 Zw Tang
2026-02-05 13:44 ` David Hildenbrand (Arm)
2026-02-06 7:37 ` Kunwu Chan
0 siblings, 2 replies; 5+ messages in thread
From: Zw Tang @ 2026-02-05 12:57 UTC (permalink / raw)
To: linux-mm, rcu
Cc: hughd, akpm, david, chrisl, kasong, paulmck, frederic, linux-kernel
Hi,
I am reporting a reproducible RCU stall observed on Linux 6.19.0-rc7,
triggered by a syzkaller C reproducer.
The stall is reported while a userspace task is executing the tmpfs
(shmem) write path. The blocked task is a syz-executor process, and the
RCU report consistently shows it running in the shmem write / folio
allocation path for an extended period of time.
The relevant call trace of the stalled task is:
shmem_file_write_iter
shmem_write_begin
shmem_get_folio_gfp
__folio_batch_add_and_move
folio_batch_move_lru
lru_add
__mod_zone_page_state
The kernel eventually reports:
rcu: INFO: rcu_preempt detected stalls on CPUs/tasks
This suggests that the task spends an excessive amount of time in the
shmem write and folio/LRU accounting path, preventing the CPU from
reaching a quiescent state and triggering the RCU stall detector.
I am not yet certain whether the stall is caused by heavy memory
pressure, LRU/zone accounting contention, or an unintended long-running
critical section in the shmem write path, but the stall is consistently
associated with shmem_file_write_iter().
Reproducer:
C reproducer: https://pastebin.com/raw/AjQ5a5PL
console output: https://pastebin.com/raw/FyBF1R7b
kernel config: https://pastebin.com/raw/LwALTGZ5
Kernel:
git tree: torvalds/linux
HEAD commit: 63804fed149a6750ffd28610c5c1c98cce6bd377
kernel version: 6.19.0-rc7 (QEMU Ubuntu 24.10)
rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
rcu: Tasks blocked on level-0 rcu_node (CPUs 0-1): P51643
rcu: (detected by 1, t=100002 jiffies, g=470049, q=86 ncpus=2)
task:syz.3.5719 state:R running task stack:25640 pid:51643
tgid:51627 ppid:49386 task_flags:0x400140 flags:0x00080012
Call Trace:
<IRQ>
sched_show_task kernel/sched/core.c:7821 [inline]
sched_show_task+0x357/0x510 kernel/sched/core.c:7796
rcu_print_detail_task_stall_rnp kernel/rcu/tree_stall.h:292 [inline]
print_other_cpu_stall kernel/rcu/tree_stall.h:681 [inline]
check_cpu_stall kernel/rcu/tree_stall.h:856 [inline]
rcu_pending kernel/rcu/tree.c:3667 [inline]
rcu_sched_clock_irq+0x20ab/0x27e0 kernel/rcu/tree.c:2704
update_process_times+0xf4/0x160 kernel/time/timer.c:2474
tick_sched_handle kernel/time/tick-sched.c:298 [inline]
tick_nohz_handler+0x504/0x720 kernel/time/tick-sched.c:319
__run_hrtimer kernel/time/hrtimer.c:1777 [inline]
__hrtimer_run_queues+0x274/0x810 kernel/time/hrtimer.c:1841
hrtimer_interrupt+0x2f3/0x750 kernel/time/hrtimer.c:1903
local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1045 [inline]
__sysvec_apic_timer_interrupt+0x82/0x250 arch/x86/kernel/apic/apic.c:1062
instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1056 [inline]
sysvec_apic_timer_interrupt+0x6b/0x80 arch/x86/kernel/apic/apic.c:1056
</IRQ>
<TASK>
asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:697
RIP: 0010:finish_task_switch+0x128/0x610 kernel/sched/core.c:5118
Code: 02 00 0f 85 67 04 00 00 49 8b 9c 24 98 0a 00 00 48 85 db 0f 85
70 03 00 00 4c 89 e7 e8 61 78 92 02 fb 65 48 8b 1d 68 51 5d 04 <48> 8d
bb e0 0a 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1
RSP: 0018:ffff88802d32f630 EFLAGS: 00000286
RAX: 0000000000000000 RBX: ffff888012496900 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff888012496900 RDI: ffff88806d535b80
RBP: ffff88802d32f670 R08: 0000000000000000 R09: ffffffff817f85a5
R10: 0000000000000000 R11: 0000000000000000 R12: ffff88806d535b80
R13: ffff88800635c600 R14: ffff88800f630f00 R15: ffff888012497374
context_switch kernel/sched/core.c:5263 [inline]
__schedule+0x1293/0x38c0 kernel/sched/core.c:6867
preempt_schedule_irq+0x49/0x70 kernel/sched/core.c:7194
irqentry_exit+0xc1/0x5a0 kernel/entry/common.c:216
asm_sysvec_irq_work+0x1a/0x20 arch/x86/include/asm/idtentry.h:733
RIP: 0010:__mod_zone_page_state+0x12/0xf0 mm/vmstat.c:347
Code: 89 ef e8 b1 53 18 00 e9 54 ff ff ff 66 66 2e 0f 1f 84 00 00 00
00 00 90 f3 0f 1e fa 48 b8 00 00 00 00 00 fc ff df 41 55 41 54 <55> 48
89 fd 48 83 c7 70 53 48 89 f9 48 c1 e9 03 48 83 ec 10 80 3c
RSP: 0018:ffff88802d32f898 EFLAGS: 00000286
RAX: dffffc0000000000 RBX: ffff88800c0c4640 RCX: 0000000000000000
RDX: 0000000000000001 RSI: 0000000000000002 RDI: ffff88807ffdcc00
RBP: ffffea00014a5a00 R08: ffffffff846c1c01 R09: ffff88806d53b6d0
R10: ffff888006278000 R11: ffff8880062785bb R12: 0000000000000000
R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000001
__update_lru_size include/linux/mm_inline.h:48 [inline]
update_lru_size include/linux/mm_inline.h:56 [inline]
lruvec_add_folio include/linux/mm_inline.h:348 [inline]
lru_add+0x44f/0x890 mm/swap.c:154
folio_batch_move_lru+0x110/0x4c0 mm/swap.c:172
__folio_batch_add_and_move+0x27e/0x7e0 mm/swap.c:196
shmem_alloc_and_add_folio mm/shmem.c:1991 [inline]
shmem_get_folio_gfp.isra.0+0xc49/0x1410 mm/shmem.c:2556
shmem_get_folio mm/shmem.c:2662 [inline]
shmem_write_begin+0x197/0x3b0 mm/shmem.c:3315
generic_perform_write+0x37f/0x800 mm/filemap.c:4314
shmem_file_write_iter+0x10d/0x140 mm/shmem.c:3490
new_sync_write fs/read_write.c:593 [inline]
vfs_write fs/read_write.c:686 [inline]
vfs_write+0xabc/0xe90 fs/read_write.c:666
ksys_write+0x121/0x240 fs/read_write.c:738
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xac/0x330 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x4b/0x53
RIP: 0033:0x7f9b5abad69f
Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 f9 92 02 00 48 8b 54
24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d
00 f0 ff ff 77 31 44 89 c7 48 89 44 24 08 e8 4c 93 02 00 48
RSP: 002b:00007f9b595eddf0 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000010000 RCX: 00007f9b5abad69f
RDX: 0000000000010000 RSI: 00007f9b511ce000 RDI: 0000000000000007
RBP: 0000000000000000 R08: 0000000000000000 R09: 00000000000002f2
R10: 00000000000001ce R11: 0000000000000293 R12: 0000000000000007
R13: 00007f9b595edef0 R14: 00007f9b595edeb0 R15: 00007f9b511ce000
</TASK>
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [BUG] rcu detected stall in shmem_file_write_iter 2026-02-05 12:57 [BUG] rcu detected stall in shmem_file_write_iter Zw Tang @ 2026-02-05 13:44 ` David Hildenbrand (Arm) 2026-02-06 7:37 ` Kunwu Chan 1 sibling, 0 replies; 5+ messages in thread From: David Hildenbrand (Arm) @ 2026-02-05 13:44 UTC (permalink / raw) To: Zw Tang, linux-mm, rcu Cc: hughd, akpm, chrisl, kasong, paulmck, frederic, linux-kernel On 2/5/26 13:57, Zw Tang wrote: > Hi, > > I am reporting a reproducible RCU stall observed on Linux 6.19.0-rc7, > triggered by a syzkaller C reproducer. > > The stall is reported while a userspace task is executing the tmpfs > (shmem) write path. The blocked task is a syz-executor process, and the > RCU report consistently shows it running in the shmem write / folio > allocation path for an extended period of time. > > The relevant call trace of the stalled task is: > > shmem_file_write_iter > shmem_write_begin > shmem_get_folio_gfp > __folio_batch_add_and_move > folio_batch_move_lru > lru_add > __mod_zone_page_state > lru_add()->lruvec_add_folio() calls update_lru_size()->__update_lru_size()->__mod_zone_page_state(). On SMP systems we end up in mm/vmstat.c:__mod_zone_page_state(). I don't see anything that could block in there. Livelocking? Or we're actually stuck somewhere else and the __mod_zone_page_state is misleading. Can you try running with lockdep to see if anything pops up? -- Cheers, David ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [BUG] rcu detected stall in shmem_file_write_iter 2026-02-05 12:57 [BUG] rcu detected stall in shmem_file_write_iter Zw Tang 2026-02-05 13:44 ` David Hildenbrand (Arm) @ 2026-02-06 7:37 ` Kunwu Chan [not found] ` <PS1PPF7E1D7501F6FE46DF307FFB8086E2FAB66A@PS1PPF7E1D7501F.apcprd02.prod.outlook.com> 1 sibling, 1 reply; 5+ messages in thread From: Kunwu Chan @ 2026-02-06 7:37 UTC (permalink / raw) To: Zw Tang, linux-mm, rcu Cc: hughd, akpm, david, chrisl, kasong, paulmck, frederic, linux-kernel On 2/5/26 20:57, Zw Tang wrote: > Hi, > > I am reporting a reproducible RCU stall observed on Linux 6.19.0-rc7, > triggered by a syzkaller C reproducer. > > The stall is reported while a userspace task is executing the tmpfs > (shmem) write path. The blocked task is a syz-executor process, and the > RCU report consistently shows it running in the shmem write / folio > allocation path for an extended period of time. > > The relevant call trace of the stalled task is: > > shmem_file_write_iter > shmem_write_begin > shmem_get_folio_gfp > __folio_batch_add_and_move > folio_batch_move_lru > lru_add > __mod_zone_page_state > > The kernel eventually reports: > > rcu: INFO: rcu_preempt detected stalls on CPUs/tasks > > This suggests that the task spends an excessive amount of time in the > shmem write and folio/LRU accounting path, preventing the CPU from > reaching a quiescent state and triggering the RCU stall detector. > > I am not yet certain whether the stall is caused by heavy memory > pressure, LRU/zone accounting contention, or an unintended long-running > critical section in the shmem write path, but the stall is consistently > associated with shmem_file_write_iter(). > > Reproducer: > > C reproducer: https://pastebin.com/raw/AjQ5a5PL > console output: https://pastebin.com/raw/FyBF1R7b > kernel config: https://pastebin.com/raw/LwALTGZ5 > > Kernel: > git tree: torvalds/linux > HEAD commit: 63804fed149a6750ffd28610c5c1c98cce6bd377 > kernel version: 6.19.0-rc7 (QEMU Ubuntu 24.10) > > > > rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: > rcu: Tasks blocked on level-0 rcu_node (CPUs 0-1): P51643 > rcu: (detected by 1, t=100002 jiffies, g=470049, q=86 ncpus=2) > task:syz.3.5719 state:R running task stack:25640 pid:51643 > tgid:51627 ppid:49386 task_flags:0x400140 flags:0x00080012 > Call Trace: > <IRQ> > sched_show_task kernel/sched/core.c:7821 [inline] > sched_show_task+0x357/0x510 kernel/sched/core.c:7796 > rcu_print_detail_task_stall_rnp kernel/rcu/tree_stall.h:292 [inline] > print_other_cpu_stall kernel/rcu/tree_stall.h:681 [inline] > check_cpu_stall kernel/rcu/tree_stall.h:856 [inline] > rcu_pending kernel/rcu/tree.c:3667 [inline] > rcu_sched_clock_irq+0x20ab/0x27e0 kernel/rcu/tree.c:2704 > update_process_times+0xf4/0x160 kernel/time/timer.c:2474 > tick_sched_handle kernel/time/tick-sched.c:298 [inline] > tick_nohz_handler+0x504/0x720 kernel/time/tick-sched.c:319 > __run_hrtimer kernel/time/hrtimer.c:1777 [inline] > __hrtimer_run_queues+0x274/0x810 kernel/time/hrtimer.c:1841 > hrtimer_interrupt+0x2f3/0x750 kernel/time/hrtimer.c:1903 > local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1045 [inline] > __sysvec_apic_timer_interrupt+0x82/0x250 arch/x86/kernel/apic/apic.c:1062 > instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1056 [inline] > sysvec_apic_timer_interrupt+0x6b/0x80 arch/x86/kernel/apic/apic.c:1056 > </IRQ> > <TASK> > asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:697 > RIP: 0010:finish_task_switch+0x128/0x610 kernel/sched/core.c:5118 > Code: 02 00 0f 85 67 04 00 00 49 8b 9c 24 98 0a 00 00 48 85 db 0f 85 > 70 03 00 00 4c 89 e7 e8 61 78 92 02 fb 65 48 8b 1d 68 51 5d 04 <48> 8d > bb e0 0a 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 > RSP: 0018:ffff88802d32f630 EFLAGS: 00000286 > RAX: 0000000000000000 RBX: ffff888012496900 RCX: 0000000000000000 > RDX: 0000000000000000 RSI: ffff888012496900 RDI: ffff88806d535b80 > RBP: ffff88802d32f670 R08: 0000000000000000 R09: ffffffff817f85a5 > R10: 0000000000000000 R11: 0000000000000000 R12: ffff88806d535b80 > R13: ffff88800635c600 R14: ffff88800f630f00 R15: ffff888012497374 > context_switch kernel/sched/core.c:5263 [inline] > __schedule+0x1293/0x38c0 kernel/sched/core.c:6867 > preempt_schedule_irq+0x49/0x70 kernel/sched/core.c:7194 > irqentry_exit+0xc1/0x5a0 kernel/entry/common.c:216 > asm_sysvec_irq_work+0x1a/0x20 arch/x86/include/asm/idtentry.h:733 > RIP: 0010:__mod_zone_page_state+0x12/0xf0 mm/vmstat.c:347 > Code: 89 ef e8 b1 53 18 00 e9 54 ff ff ff 66 66 2e 0f 1f 84 00 00 00 > 00 00 90 f3 0f 1e fa 48 b8 00 00 00 00 00 fc ff df 41 55 41 54 <55> 48 > 89 fd 48 83 c7 70 53 48 89 f9 48 c1 e9 03 48 83 ec 10 80 3c > RSP: 0018:ffff88802d32f898 EFLAGS: 00000286 > RAX: dffffc0000000000 RBX: ffff88800c0c4640 RCX: 0000000000000000 > RDX: 0000000000000001 RSI: 0000000000000002 RDI: ffff88807ffdcc00 > RBP: ffffea00014a5a00 R08: ffffffff846c1c01 R09: ffff88806d53b6d0 > R10: ffff888006278000 R11: ffff8880062785bb R12: 0000000000000000 > R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000001 > __update_lru_size include/linux/mm_inline.h:48 [inline] > update_lru_size include/linux/mm_inline.h:56 [inline] > lruvec_add_folio include/linux/mm_inline.h:348 [inline] > lru_add+0x44f/0x890 mm/swap.c:154 > folio_batch_move_lru+0x110/0x4c0 mm/swap.c:172 > __folio_batch_add_and_move+0x27e/0x7e0 mm/swap.c:196 > shmem_alloc_and_add_folio mm/shmem.c:1991 [inline] > shmem_get_folio_gfp.isra.0+0xc49/0x1410 mm/shmem.c:2556 > shmem_get_folio mm/shmem.c:2662 [inline] > shmem_write_begin+0x197/0x3b0 mm/shmem.c:3315 > generic_perform_write+0x37f/0x800 mm/filemap.c:4314 > shmem_file_write_iter+0x10d/0x140 mm/shmem.c:3490 > new_sync_write fs/read_write.c:593 [inline] > vfs_write fs/read_write.c:686 [inline] > vfs_write+0xabc/0xe90 fs/read_write.c:666 > ksys_write+0x121/0x240 fs/read_write.c:738 > do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] > do_syscall_64+0xac/0x330 arch/x86/entry/syscall_64.c:94 > entry_SYSCALL_64_after_hwframe+0x4b/0x53 > RIP: 0033:0x7f9b5abad69f > Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 f9 92 02 00 48 8b 54 > 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d > 00 f0 ff ff 77 31 44 89 c7 48 89 44 24 08 e8 4c 93 02 00 48 > RSP: 002b:00007f9b595eddf0 EFLAGS: 00000293 ORIG_RAX: 0000000000000001 > RAX: ffffffffffffffda RBX: 0000000000010000 RCX: 00007f9b5abad69f > RDX: 0000000000010000 RSI: 00007f9b511ce000 RDI: 0000000000000007 > RBP: 0000000000000000 R08: 0000000000000000 R09: 00000000000002f2 > R10: 00000000000001ce R11: 0000000000000293 R12: 0000000000000007 > R13: 00007f9b595edef0 R14: 00007f9b595edeb0 R15: 00007f9b511ce000 > </TASK> > Hi Zw, __mod_zone_page_state() itself doesn't appear to block, so the reported frame is likely just where the task was sampled. Given the task stays in running state, this looks more like a long CPU-bound section rather than a blocking issue. With heavy shmem writes, we may be spending significant time in the folio allocation/LRU paths (for example folio_batch_move_lru() or alloc_pages() slowpath with reclaim/compaction), which can run for quite a while without hitting a reschedule point and thus starve RCU. Could you try enabling lockdep as David suggested? It would also help to collect some tracing or perf data around the allocation/LRU paths to see where the CPU time is actually spent. Thanx, Kunwu ^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <PS1PPF7E1D7501F6FE46DF307FFB8086E2FAB66A@PS1PPF7E1D7501F.apcprd02.prod.outlook.com>]
* 回复: [BUG] rcu detected stall in shmem_file_write_iter [not found] ` <PS1PPF7E1D7501F6FE46DF307FFB8086E2FAB66A@PS1PPF7E1D7501F.apcprd02.prod.outlook.com> @ 2026-02-06 12:16 ` Zw Tang 2026-02-09 7:34 ` Kunwu Chan 0 siblings, 1 reply; 5+ messages in thread From: Zw Tang @ 2026-02-06 12:16 UTC (permalink / raw) To: Kunwu Chan, linux-mm, rcu Cc: hughd, akpm, david, chrisl, kasong, paulmck, frederic, linux-kernel Hi David, hi Kunwu, thanks a lot for the suggestions. I reran the reproducer with CONFIG_PROVE_LOCKING=y, CONFIG_LOCKDEP=y, CONFIG_DEBUG_LOCK_ALLOC=y enabled. Based on the lockdep-enabled run, here is what I can clarify: Lockdep does not report any lock inversion, recursive locking, or circular dependency. The samples showing __mod_zone_page_state() do not appear to indicate a blocking point; this frame indeed seems to be just where the task was sampled. From the timeline of the reports, the earliest problematic behavior appears before the MM/LRU-heavy paths. In the first hung-task report, multiple repro1 threads are already blocked in: down_write() └─ rwbase_write_lock() └─ __rt_mutex_slowlock_locked() └─ rt_mutex_schedule() via the do_vfs_ioctl() → perf_fasync() path, and are in D state for more than 143 seconds at that point. After several threads are stuck there, the system degrades further: other threads remain in R state, spending long, uninterrupted time in MM allocation / LRU paths (alloc_pages(), get_page_from_freelist(), __handle_mm_fault()), without hitting reschedule points. This then leads to RCU preempt stalls, and eventually workqueue lockups (e.g. vmstat_shepherd, do_cache_clean, wb_workfn). Lockdep’s “show all locks held” output does not show the blocked repro1 threads holding any MM/LRU/zone locks themselves; they typically only hold the filesystem mutex at that point, which suggests the contended RT rwsem is held elsewhere. Overall, this currently looks less like a single blocking bug in __mod_zone_page_state(), and more like a PREEMPT_RT-specific starvation scenario, where long-held RT rwsems in the ioctl/perf path combined with long CPU-bound MM/LRU execution amplify into RCU starvation and workqueue lockups. Below is the earliest hung-task report from the lockdep-enabled run for reference: [386.499937] INFO: task repro1:2066 blocked for more than 143 seconds. [386.499956] Not tainted 6.19.0-rc7 #4 [386.499964] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [386.499970] task:repro1 state:D stack:28400 pid:2066 tgid:2066 ppid:293 2 [386.500022] Call Trace: [386.500027] <TASK> [386.500037] __schedule+0x1198/0x3f00 [386.500069] ? io_schedule_timeout+0x80/0x80 [386.500088] ? kvm_sched_clock_read+0x16/0x20 [386.500111] ? local_clock_noinstr+0xf/0xc0 [386.500125] ? __rt_mutex_slowlock_locked.constprop.0+0xecd/0x30c0 [386.500148] rt_mutex_schedule+0x9f/0xe0 [386.500171] __rt_mutex_slowlock_locked.constprop.0+0xedc/0x30c0 [386.500197] ? down_write_trylock+0x1a0/0x1a0 [386.500222] ? lock_acquired+0xbd/0x340 [386.500245] rwbase_write_lock+0x744/0xa80 [386.500266] ? perf_fasync+0xc0/0x130 [386.500284] ? rt_mutex_adjust_prio_chain.isra.0+0x3240/0x3240 [386.500304] ? kvm_sched_clock_read+0x16/0x20 [386.500329] ? perf_fasync+0xc0/0x130 [386.500344] ? local_clock+0x10/0x20 [386.500364] ? lock_contended+0x189/0x420 [386.500385] down_write+0x6e/0x1e0 [386.500405] perf_fasync+0xc0/0x130 [386.500421] ? perf_cgroup_css_free+0x50/0x50 [386.500440] do_vfs_ioctl+0x9b9/0x1480 [386.500457] ? lock_vma_under_rcu+0x7ee/0xd90 [386.500475] ? ioctl_file_clone+0xf0/0xf0 [386.500490] ? lock_is_held_type+0xa0/0x110 [386.500506] ? handle_mm_fault+0x5a6/0x9d0 [386.500526] ? kvm_sched_clock_read+0x16/0x20 [386.502053] ? local_clock_noinstr+0xf/0xc0 [386.502073] ? handle_mm_fault+0x5a6/0x9d0 [386.502092] ? exc_page_fault+0xb0/0x180 [386.502106] ? kvm_sched_clock_read+0x16/0x20 [386.502129] ? local_clock_noinstr+0xf/0xc0 [386.502142] ? exc_page_fault+0xb0/0x180 [386.502154] ? local_clock+0x10/0x20 [386.502174] ? lock_release+0x258/0x3c0 [386.502196] ? irqentry_exit+0xf0/0x6d0 [386.502213] __x64_sys_ioctl+0x112/0x220 [386.502232] do_syscall_64+0xc3/0x430 [386.502253] entry_SYSCALL_64_after_hwframe+0x4b/0x53 [386.502269] RIP: 0033:0x7f62f7922fc9 [386.502351] </TASK> [386.502357] INFO: task repro1:2072 blocked for more than 143 seconds. [386.502366] Not tainted 6.19.0-rc7 #4 [386.502373] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [386.502378] task:repro1 state:D stack:28400 pid:2072 tgid:2072 ppid:294 2 [386.502427] Call Trace: [386.502431] <TASK> [386.502439] __schedule+0x1198/0x3f00 [386.502463] ? io_schedule_timeout+0x80/0x80 [386.502483] ? mark_held_locks+0x50/0x80 [386.502505] rt_mutex_schedule+0x9f/0xe0 [386.502527] __rt_mutex_slowlock_locked.constprop.0+0xedc/0x30c0 [386.503218] ? down_write_trylock+0x1a0/0x1a0 [386.503246] ? lock_acquired+0xbd/0x340 [386.503269] rwbase_write_lock+0x744/0xa80 [386.503290] ? perf_fasync+0xc0/0x130 [386.503306] ? rt_mutex_adjust_prio_chain.isra.0+0x3240/0x3240 [386.503327] ? kvm_sched_clock_read+0x16/0x20 [386.503351] ? perf_fasync+0xc0/0x130 [386.503366] ? local_clock+0x10/0x20 [386.503386] ? lock_contended+0x189/0x420 [386.503407] down_write+0x6e/0x1e0 [386.503427] perf_fasync+0xc0/0x130 [386.503442] ? perf_cgroup_css_free+0x50/0x50 [386.503461] do_vfs_ioctl+0x9b9/0x1480 [386.503476] ? lock_vma_under_rcu+0x7ee/0xd90 [386.503493] ? ioctl_file_clone+0xf0/0xf0 [386.503508] ? lock_is_held_type+0xa0/0x110 [386.503524] ? handle_mm_fault+0x5a6/0x9d0 [386.503543] ? kvm_sched_clock_read+0x16/0x20 [386.504012] ? local_clock_noinstr+0xf/0xc0 [386.504049] ? exc_page_fault+0xb0/0x180 [386.504312] ? irqentry_exit+0xf0/0x6d0 [386.504330] __x64_sys_ioctl+0x112/0x220 [386.504369] entry_SYSCALL_64_after_hwframe+0x4b/0x53 [386.504464] </TASK>[386.504470] INFO: task repro1:2073 blocked for more than 143 seconds. [386.504491] task:repro1 state:D stack:28400 pid:2073 tgid:2073 ppid:292 2 [386.504540] Call Trace: [386.504544] <TASK> [386.505300] __schedule+0x1198/0x3f00 [386.505347] ? mark_held_locks+0x50/0x80 [386.505369] rt_mutex_schedule+0x9f/0xe0 [386.505391] __rt_mutex_slowlock_locked.constprop.0+0xedc/0x30c0 [386.505464] rwbase_write_lock+0x744/0xa80 [386.505988] down_write+0x6e/0x1e0 [386.506042] do_vfs_ioctl+0x9b9/0x1480 [386.506301] __x64_sys_ioctl+0x112/0x220 [386.506340] entry_SYSCALL_64_after_hwframe+0x4b/0x53 [386.506434] </TASK> [386.506442] Showing all locks held in the system: [386.506447] 4 locks held by pr/legacy/16: [386.506456] 1 lock held by khungtaskd/37: [386.506464] #0: ffffffff85041540 (rcu_read_lock){....}-{1:3} [386.506503] 1 lock held by in:imklog/196: [386.506513] 1 lock held by repro1/2040: [386.506522] 1 lock held by repro1/2066: [386.506532] #0: ffff88800784bc50 (&sb->s_type->i_mutex_key#17) [386.507276] 1 lock held by repro1/2072: [386.507284] #0: ffff88800784bc50 (&sb->s_type->i_mutex_key#17) [386.507321] 1 lock held by repro1/2073: [386.507328] #0: ffff88800784bc50 (&sb->s_type->i_mutex_key#17) [427.459692] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 40s! [427.459779] workqueue events: [427.459809] pending: vmstat_shepherd, e1000_watchdog [427.460020] workqueue events_freezable_pwr_efficient: [427.460020] in-flight: disk_events_workfn [427.460052] workqueue writeback: [427.460084] in-flight: wb_workfn [427.460231] Showing backtraces of running workers in stalled CPU-bound worker pools Message from syslogd@syzkaller at Feb 6 10:27:59 ... kernel:[ 427.459692] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuc! Thanks Zw Tang ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 回复: [BUG] rcu detected stall in shmem_file_write_iter 2026-02-06 12:16 ` 回复: " Zw Tang @ 2026-02-09 7:34 ` Kunwu Chan 0 siblings, 0 replies; 5+ messages in thread From: Kunwu Chan @ 2026-02-09 7:34 UTC (permalink / raw) To: Zw Tang, linux-mm, rcu Cc: hughd, akpm, david, chrisl, kasong, paulmck, frederic, linux-kernel On 2/6/26 20:16, Zw Tang wrote: > Hi David, hi Kunwu, > thanks a lot for the suggestions. > I reran the reproducer with > CONFIG_PROVE_LOCKING=y, CONFIG_LOCKDEP=y, CONFIG_DEBUG_LOCK_ALLOC=y enabled. > Based on the lockdep-enabled run, here is what I can clarify: > > Lockdep does not report any lock inversion, recursive locking, or > circular dependency. > The samples showing __mod_zone_page_state() do not appear to indicate > a blocking point; this frame indeed seems to be just where the task > was sampled. Thanks for the lockdep-enabled rerun. Agreed that __mod_zone_page_state() is most likely just a sampling point. > From the timeline of the reports, the earliest problematic behavior > appears before the MM/LRU-heavy paths. > In the first hung-task report, multiple repro1 threads are already blocked in: > > down_write() > └─ rwbase_write_lock() > └─ __rt_mutex_slowlock_locked() > └─ rt_mutex_schedule() > > via the do_vfs_ioctl() → perf_fasync() path, and are in D state for > more than 143 seconds at that point. > After several threads are stuck there, the system degrades further: > other threads remain in R state, spending long, uninterrupted time in > MM allocation / LRU paths > (alloc_pages(), get_page_from_freelist(), __handle_mm_fault()), > without hitting reschedule points. > This then leads to RCU preempt stalls, and eventually workqueue lockups > (e.g. vmstat_shepherd, do_cache_clean, wb_workfn). > Lockdep’s “show all locks held” output does not show the blocked repro1 > threads holding any MM/LRU/zone locks themselves; they typically only hold > the filesystem mutex at that point, which suggests the contended RT rwsem > is held elsewhere. > Overall, this currently looks less like a single blocking bug in > __mod_zone_page_state(), and more like a PREEMPT_RT-specific > starvation scenario, > where long-held RT rwsems in the ioctl/perf path combined with long CPU-bound > MM/LRU execution amplify into RCU starvation and workqueue lockups. > Below is the earliest hung-task report from the lockdep-enabled run > for reference: > > > [386.499937] INFO: task repro1:2066 blocked for more than 143 seconds. > [386.499956] Not tainted 6.19.0-rc7 #4 > [386.499964] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [386.499970] task:repro1 state:D stack:28400 pid:2066 tgid:2066 ppid:293 2 The earliest hung tasks are blocked in perf_fasync() at inode_lock() (down_write(&inode->i_rwsem)), which indicates heavy inode rwsem contention. However, the waiter stacks alone don’t identify the lock holder. To move this forward, could you capture a SysRq-T (and optionally SysRq-w) at the time of the hang so we can inspect the system state and help identify the lock holder/CPU hog, plus any PREEMPT_RT PI/owner-chain information for the underlying rt_mutex/rwsem (if available)? Thanx, Kunwu > [386.500022] Call Trace: > [386.500027] <TASK> > [386.500037] __schedule+0x1198/0x3f00 > [386.500069] ? io_schedule_timeout+0x80/0x80 > [386.500088] ? kvm_sched_clock_read+0x16/0x20 > [386.500111] ? local_clock_noinstr+0xf/0xc0 > [386.500125] ? __rt_mutex_slowlock_locked.constprop.0+0xecd/0x30c0 > [386.500148] rt_mutex_schedule+0x9f/0xe0 > [386.500171] __rt_mutex_slowlock_locked.constprop.0+0xedc/0x30c0 > [386.500197] ? down_write_trylock+0x1a0/0x1a0 > [386.500222] ? lock_acquired+0xbd/0x340 > [386.500245] rwbase_write_lock+0x744/0xa80 > [386.500266] ? perf_fasync+0xc0/0x130 > [386.500284] ? rt_mutex_adjust_prio_chain.isra.0+0x3240/0x3240 > [386.500304] ? kvm_sched_clock_read+0x16/0x20 > [386.500329] ? perf_fasync+0xc0/0x130 > [386.500344] ? local_clock+0x10/0x20 > [386.500364] ? lock_contended+0x189/0x420 > [386.500385] down_write+0x6e/0x1e0 > [386.500405] perf_fasync+0xc0/0x130 > [386.500421] ? perf_cgroup_css_free+0x50/0x50 > [386.500440] do_vfs_ioctl+0x9b9/0x1480 > [386.500457] ? lock_vma_under_rcu+0x7ee/0xd90 > [386.500475] ? ioctl_file_clone+0xf0/0xf0 > [386.500490] ? lock_is_held_type+0xa0/0x110 > [386.500506] ? handle_mm_fault+0x5a6/0x9d0 > [386.500526] ? kvm_sched_clock_read+0x16/0x20 > [386.502053] ? local_clock_noinstr+0xf/0xc0 > [386.502073] ? handle_mm_fault+0x5a6/0x9d0 > [386.502092] ? exc_page_fault+0xb0/0x180 > [386.502106] ? kvm_sched_clock_read+0x16/0x20 > [386.502129] ? local_clock_noinstr+0xf/0xc0 > [386.502142] ? exc_page_fault+0xb0/0x180 > [386.502154] ? local_clock+0x10/0x20 > [386.502174] ? lock_release+0x258/0x3c0 > [386.502196] ? irqentry_exit+0xf0/0x6d0 > [386.502213] __x64_sys_ioctl+0x112/0x220 > [386.502232] do_syscall_64+0xc3/0x430 > [386.502253] entry_SYSCALL_64_after_hwframe+0x4b/0x53 > [386.502269] RIP: 0033:0x7f62f7922fc9 > [386.502351] </TASK> > [386.502357] INFO: task repro1:2072 blocked for more than 143 seconds. > [386.502366] Not tainted 6.19.0-rc7 #4 > [386.502373] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [386.502378] task:repro1 state:D stack:28400 pid:2072 tgid:2072 ppid:294 2 > [386.502427] Call Trace: > [386.502431] <TASK> > [386.502439] __schedule+0x1198/0x3f00 > [386.502463] ? io_schedule_timeout+0x80/0x80 > [386.502483] ? mark_held_locks+0x50/0x80 > [386.502505] rt_mutex_schedule+0x9f/0xe0 > [386.502527] __rt_mutex_slowlock_locked.constprop.0+0xedc/0x30c0 > [386.503218] ? down_write_trylock+0x1a0/0x1a0 > [386.503246] ? lock_acquired+0xbd/0x340 > [386.503269] rwbase_write_lock+0x744/0xa80 > [386.503290] ? perf_fasync+0xc0/0x130 > [386.503306] ? rt_mutex_adjust_prio_chain.isra.0+0x3240/0x3240 > [386.503327] ? kvm_sched_clock_read+0x16/0x20 > [386.503351] ? perf_fasync+0xc0/0x130 > [386.503366] ? local_clock+0x10/0x20 > [386.503386] ? lock_contended+0x189/0x420 > [386.503407] down_write+0x6e/0x1e0 > [386.503427] perf_fasync+0xc0/0x130 > [386.503442] ? perf_cgroup_css_free+0x50/0x50 > [386.503461] do_vfs_ioctl+0x9b9/0x1480 > [386.503476] ? lock_vma_under_rcu+0x7ee/0xd90 > [386.503493] ? ioctl_file_clone+0xf0/0xf0 > [386.503508] ? lock_is_held_type+0xa0/0x110 > [386.503524] ? handle_mm_fault+0x5a6/0x9d0 > [386.503543] ? kvm_sched_clock_read+0x16/0x20 > [386.504012] ? local_clock_noinstr+0xf/0xc0 > [386.504049] ? exc_page_fault+0xb0/0x180 > [386.504312] ? irqentry_exit+0xf0/0x6d0 > [386.504330] __x64_sys_ioctl+0x112/0x220 > [386.504369] entry_SYSCALL_64_after_hwframe+0x4b/0x53 > [386.504464] </TASK>[386.504470] INFO: task repro1:2073 blocked for > more than 143 seconds. > [386.504491] task:repro1 state:D stack:28400 pid:2073 tgid:2073 ppid:292 2 > [386.504540] Call Trace: > [386.504544] <TASK> > [386.505300] __schedule+0x1198/0x3f00 > [386.505347] ? mark_held_locks+0x50/0x80 > [386.505369] rt_mutex_schedule+0x9f/0xe0 > [386.505391] __rt_mutex_slowlock_locked.constprop.0+0xedc/0x30c0 > [386.505464] rwbase_write_lock+0x744/0xa80 > [386.505988] down_write+0x6e/0x1e0 > [386.506042] do_vfs_ioctl+0x9b9/0x1480 > [386.506301] __x64_sys_ioctl+0x112/0x220 > [386.506340] entry_SYSCALL_64_after_hwframe+0x4b/0x53 > [386.506434] </TASK> > [386.506442] Showing all locks held in the system: > [386.506447] 4 locks held by pr/legacy/16: > [386.506456] 1 lock held by khungtaskd/37: > [386.506464] #0: ffffffff85041540 (rcu_read_lock){....}-{1:3} > [386.506503] 1 lock held by in:imklog/196: > [386.506513] 1 lock held by repro1/2040: > [386.506522] 1 lock held by repro1/2066: > [386.506532] #0: ffff88800784bc50 (&sb->s_type->i_mutex_key#17) > [386.507276] 1 lock held by repro1/2072: > [386.507284] #0: ffff88800784bc50 (&sb->s_type->i_mutex_key#17) > [386.507321] 1 lock held by repro1/2073: > [386.507328] #0: ffff88800784bc50 (&sb->s_type->i_mutex_key#17) > [427.459692] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 > nice=0 stuck for 40s! > [427.459779] workqueue events: > [427.459809] pending: vmstat_shepherd, e1000_watchdog > [427.460020] workqueue events_freezable_pwr_efficient: > [427.460020] in-flight: disk_events_workfn > [427.460052] workqueue writeback: > [427.460084] in-flight: wb_workfn > [427.460231] Showing backtraces of running workers in stalled > CPU-bound worker pools > Message from syslogd@syzkaller at Feb 6 10:27:59 ... kernel:[ > 427.459692] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 > nice=0 stuc! > > Thanks > Zw Tang ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-02-09 7:35 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-02-05 12:57 [BUG] rcu detected stall in shmem_file_write_iter Zw Tang
2026-02-05 13:44 ` David Hildenbrand (Arm)
2026-02-06 7:37 ` Kunwu Chan
[not found] ` <PS1PPF7E1D7501F6FE46DF307FFB8086E2FAB66A@PS1PPF7E1D7501F.apcprd02.prod.outlook.com>
2026-02-06 12:16 ` 回复: " Zw Tang
2026-02-09 7:34 ` Kunwu Chan
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox