[BUG] rcu detected stall in shmem_file_write

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [BUG] rcu detected stall in shmem_file_write_iter
@ 2026-02-05 12:57 Zw Tang
  2026-02-05 13:44 ` David Hildenbrand (Arm)
  2026-02-06  7:37 ` Kunwu Chan
  0 siblings, 2 replies; 5+ messages in thread
From: Zw Tang @ 2026-02-05 12:57 UTC (permalink / raw)
  To: linux-mm, rcu
  Cc: hughd, akpm, david, chrisl, kasong, paulmck, frederic, linux-kernel

Hi,

I am reporting a reproducible RCU stall observed on Linux 6.19.0-rc7,
triggered by a syzkaller C reproducer.

The stall is reported while a userspace task is executing the tmpfs
(shmem) write path. The blocked task is a syz-executor process, and the
RCU report consistently shows it running in the shmem write / folio
allocation path for an extended period of time.

The relevant call trace of the stalled task is:

  shmem_file_write_iter
    shmem_write_begin
      shmem_get_folio_gfp
        __folio_batch_add_and_move
          folio_batch_move_lru
            lru_add
              __mod_zone_page_state

The kernel eventually reports:

  rcu: INFO: rcu_preempt detected stalls on CPUs/tasks

This suggests that the task spends an excessive amount of time in the
shmem write and folio/LRU accounting path, preventing the CPU from
reaching a quiescent state and triggering the RCU stall detector.

I am not yet certain whether the stall is caused by heavy memory
pressure, LRU/zone accounting contention, or an unintended long-running
critical section in the shmem write path, but the stall is consistently
associated with shmem_file_write_iter().

Reproducer:

C reproducer: https://pastebin.com/raw/AjQ5a5PL
console output: https://pastebin.com/raw/FyBF1R7b
kernel config: https://pastebin.com/raw/LwALTGZ5

Kernel:
git tree: torvalds/linux
HEAD commit: 63804fed149a6750ffd28610c5c1c98cce6bd377
kernel version: 6.19.0-rc7 (QEMU Ubuntu 24.10)



rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
rcu:  Tasks blocked on level-0 rcu_node (CPUs 0-1): P51643
rcu:  (detected by 1, t=100002 jiffies, g=470049, q=86 ncpus=2)
task:syz.3.5719      state:R  running task     stack:25640 pid:51643
tgid:51627 ppid:49386  task_flags:0x400140 flags:0x00080012
Call Trace:
 <IRQ>
 sched_show_task kernel/sched/core.c:7821 [inline]
 sched_show_task+0x357/0x510 kernel/sched/core.c:7796
 rcu_print_detail_task_stall_rnp kernel/rcu/tree_stall.h:292 [inline]
 print_other_cpu_stall kernel/rcu/tree_stall.h:681 [inline]
 check_cpu_stall kernel/rcu/tree_stall.h:856 [inline]
 rcu_pending kernel/rcu/tree.c:3667 [inline]
 rcu_sched_clock_irq+0x20ab/0x27e0 kernel/rcu/tree.c:2704
 update_process_times+0xf4/0x160 kernel/time/timer.c:2474
 tick_sched_handle kernel/time/tick-sched.c:298 [inline]
 tick_nohz_handler+0x504/0x720 kernel/time/tick-sched.c:319
 __run_hrtimer kernel/time/hrtimer.c:1777 [inline]
 __hrtimer_run_queues+0x274/0x810 kernel/time/hrtimer.c:1841
 hrtimer_interrupt+0x2f3/0x750 kernel/time/hrtimer.c:1903
 local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1045 [inline]
 __sysvec_apic_timer_interrupt+0x82/0x250 arch/x86/kernel/apic/apic.c:1062
 instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1056 [inline]
 sysvec_apic_timer_interrupt+0x6b/0x80 arch/x86/kernel/apic/apic.c:1056
 </IRQ>
 <TASK>
 asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:697
RIP: 0010:finish_task_switch+0x128/0x610 kernel/sched/core.c:5118
Code: 02 00 0f 85 67 04 00 00 49 8b 9c 24 98 0a 00 00 48 85 db 0f 85
70 03 00 00 4c 89 e7 e8 61 78 92 02 fb 65 48 8b 1d 68 51 5d 04 <48> 8d
bb e0 0a 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1
RSP: 0018:ffff88802d32f630 EFLAGS: 00000286
RAX: 0000000000000000 RBX: ffff888012496900 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff888012496900 RDI: ffff88806d535b80
RBP: ffff88802d32f670 R08: 0000000000000000 R09: ffffffff817f85a5
R10: 0000000000000000 R11: 0000000000000000 R12: ffff88806d535b80
R13: ffff88800635c600 R14: ffff88800f630f00 R15: ffff888012497374
 context_switch kernel/sched/core.c:5263 [inline]
 __schedule+0x1293/0x38c0 kernel/sched/core.c:6867
 preempt_schedule_irq+0x49/0x70 kernel/sched/core.c:7194
 irqentry_exit+0xc1/0x5a0 kernel/entry/common.c:216
 asm_sysvec_irq_work+0x1a/0x20 arch/x86/include/asm/idtentry.h:733
RIP: 0010:__mod_zone_page_state+0x12/0xf0 mm/vmstat.c:347
Code: 89 ef e8 b1 53 18 00 e9 54 ff ff ff 66 66 2e 0f 1f 84 00 00 00
00 00 90 f3 0f 1e fa 48 b8 00 00 00 00 00 fc ff df 41 55 41 54 <55> 48
89 fd 48 83 c7 70 53 48 89 f9 48 c1 e9 03 48 83 ec 10 80 3c
RSP: 0018:ffff88802d32f898 EFLAGS: 00000286
RAX: dffffc0000000000 RBX: ffff88800c0c4640 RCX: 0000000000000000
RDX: 0000000000000001 RSI: 0000000000000002 RDI: ffff88807ffdcc00
RBP: ffffea00014a5a00 R08: ffffffff846c1c01 R09: ffff88806d53b6d0
R10: ffff888006278000 R11: ffff8880062785bb R12: 0000000000000000
R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000001
 __update_lru_size include/linux/mm_inline.h:48 [inline]
 update_lru_size include/linux/mm_inline.h:56 [inline]
 lruvec_add_folio include/linux/mm_inline.h:348 [inline]
 lru_add+0x44f/0x890 mm/swap.c:154
 folio_batch_move_lru+0x110/0x4c0 mm/swap.c:172
 __folio_batch_add_and_move+0x27e/0x7e0 mm/swap.c:196
 shmem_alloc_and_add_folio mm/shmem.c:1991 [inline]
 shmem_get_folio_gfp.isra.0+0xc49/0x1410 mm/shmem.c:2556
 shmem_get_folio mm/shmem.c:2662 [inline]
 shmem_write_begin+0x197/0x3b0 mm/shmem.c:3315
 generic_perform_write+0x37f/0x800 mm/filemap.c:4314
 shmem_file_write_iter+0x10d/0x140 mm/shmem.c:3490
 new_sync_write fs/read_write.c:593 [inline]
 vfs_write fs/read_write.c:686 [inline]
 vfs_write+0xabc/0xe90 fs/read_write.c:666
 ksys_write+0x121/0x240 fs/read_write.c:738
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xac/0x330 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x4b/0x53
RIP: 0033:0x7f9b5abad69f
Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 f9 92 02 00 48 8b 54
24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d
00 f0 ff ff 77 31 44 89 c7 48 89 44 24 08 e8 4c 93 02 00 48
RSP: 002b:00007f9b595eddf0 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000010000 RCX: 00007f9b5abad69f
RDX: 0000000000010000 RSI: 00007f9b511ce000 RDI: 0000000000000007
RBP: 0000000000000000 R08: 0000000000000000 R09: 00000000000002f2
R10: 00000000000001ce R11: 0000000000000293 R12: 0000000000000007
R13: 00007f9b595edef0 R14: 00007f9b595edeb0 R15: 00007f9b511ce000
 </TASK>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG] rcu detected stall in shmem_file_write_iter
  2026-02-05 12:57 [BUG] rcu detected stall in shmem_file_write_iter Zw Tang
@ 2026-02-05 13:44 ` David Hildenbrand (Arm)
  2026-02-06  7:37 ` Kunwu Chan
  1 sibling, 0 replies; 5+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-05 13:44 UTC (permalink / raw)
  To: Zw Tang, linux-mm, rcu
  Cc: hughd, akpm, chrisl, kasong, paulmck, frederic, linux-kernel

On 2/5/26 13:57, Zw Tang wrote:
> Hi,
> 
> I am reporting a reproducible RCU stall observed on Linux 6.19.0-rc7,
> triggered by a syzkaller C reproducer.
> 
> The stall is reported while a userspace task is executing the tmpfs
> (shmem) write path. The blocked task is a syz-executor process, and the
> RCU report consistently shows it running in the shmem write / folio
> allocation path for an extended period of time.
> 
> The relevant call trace of the stalled task is:
> 
>    shmem_file_write_iter
>      shmem_write_begin
>        shmem_get_folio_gfp
>          __folio_batch_add_and_move
>            folio_batch_move_lru
>              lru_add
>                __mod_zone_page_state
> 

lru_add()->lruvec_add_folio() calls 
update_lru_size()->__update_lru_size()->__mod_zone_page_state().

On SMP systems we end up in mm/vmstat.c:__mod_zone_page_state().

I don't see anything that could block in there.

Livelocking? Or we're actually stuck somewhere else and the 
__mod_zone_page_state is misleading.

Can you try running with lockdep to see if anything pops up?

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG] rcu detected stall in shmem_file_write_iter
  2026-02-05 12:57 [BUG] rcu detected stall in shmem_file_write_iter Zw Tang
  2026-02-05 13:44 ` David Hildenbrand (Arm)
@ 2026-02-06  7:37 ` Kunwu Chan
       [not found]   ` <PS1PPF7E1D7501F6FE46DF307FFB8086E2FAB66A@PS1PPF7E1D7501F.apcprd02.prod.outlook.com>
  1 sibling, 1 reply; 5+ messages in thread
From: Kunwu Chan @ 2026-02-06  7:37 UTC (permalink / raw)
  To: Zw Tang, linux-mm, rcu
  Cc: hughd, akpm, david, chrisl, kasong, paulmck, frederic, linux-kernel

On 2/5/26 20:57, Zw Tang wrote:
> Hi,
>
> I am reporting a reproducible RCU stall observed on Linux 6.19.0-rc7,
> triggered by a syzkaller C reproducer.
>
> The stall is reported while a userspace task is executing the tmpfs
> (shmem) write path. The blocked task is a syz-executor process, and the
> RCU report consistently shows it running in the shmem write / folio
> allocation path for an extended period of time.
>
> The relevant call trace of the stalled task is:
>
>   shmem_file_write_iter
>     shmem_write_begin
>       shmem_get_folio_gfp
>         __folio_batch_add_and_move
>           folio_batch_move_lru
>             lru_add
>               __mod_zone_page_state
>
> The kernel eventually reports:
>
>   rcu: INFO: rcu_preempt detected stalls on CPUs/tasks
>
> This suggests that the task spends an excessive amount of time in the
> shmem write and folio/LRU accounting path, preventing the CPU from
> reaching a quiescent state and triggering the RCU stall detector.
>
> I am not yet certain whether the stall is caused by heavy memory
> pressure, LRU/zone accounting contention, or an unintended long-running
> critical section in the shmem write path, but the stall is consistently
> associated with shmem_file_write_iter().
>
> Reproducer:
>
> C reproducer: https://pastebin.com/raw/AjQ5a5PL
> console output: https://pastebin.com/raw/FyBF1R7b
> kernel config: https://pastebin.com/raw/LwALTGZ5
>
> Kernel:
> git tree: torvalds/linux
> HEAD commit: 63804fed149a6750ffd28610c5c1c98cce6bd377
> kernel version: 6.19.0-rc7 (QEMU Ubuntu 24.10)
>
>
>
> rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> rcu:  Tasks blocked on level-0 rcu_node (CPUs 0-1): P51643
> rcu:  (detected by 1, t=100002 jiffies, g=470049, q=86 ncpus=2)
> task:syz.3.5719      state:R  running task     stack:25640 pid:51643
> tgid:51627 ppid:49386  task_flags:0x400140 flags:0x00080012
> Call Trace:
>  <IRQ>
>  sched_show_task kernel/sched/core.c:7821 [inline]
>  sched_show_task+0x357/0x510 kernel/sched/core.c:7796
>  rcu_print_detail_task_stall_rnp kernel/rcu/tree_stall.h:292 [inline]
>  print_other_cpu_stall kernel/rcu/tree_stall.h:681 [inline]
>  check_cpu_stall kernel/rcu/tree_stall.h:856 [inline]
>  rcu_pending kernel/rcu/tree.c:3667 [inline]
>  rcu_sched_clock_irq+0x20ab/0x27e0 kernel/rcu/tree.c:2704
>  update_process_times+0xf4/0x160 kernel/time/timer.c:2474
>  tick_sched_handle kernel/time/tick-sched.c:298 [inline]
>  tick_nohz_handler+0x504/0x720 kernel/time/tick-sched.c:319
>  __run_hrtimer kernel/time/hrtimer.c:1777 [inline]
>  __hrtimer_run_queues+0x274/0x810 kernel/time/hrtimer.c:1841
>  hrtimer_interrupt+0x2f3/0x750 kernel/time/hrtimer.c:1903
>  local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1045 [inline]
>  __sysvec_apic_timer_interrupt+0x82/0x250 arch/x86/kernel/apic/apic.c:1062
>  instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1056 [inline]
>  sysvec_apic_timer_interrupt+0x6b/0x80 arch/x86/kernel/apic/apic.c:1056
>  </IRQ>
>  <TASK>
>  asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:697
> RIP: 0010:finish_task_switch+0x128/0x610 kernel/sched/core.c:5118
> Code: 02 00 0f 85 67 04 00 00 49 8b 9c 24 98 0a 00 00 48 85 db 0f 85
> 70 03 00 00 4c 89 e7 e8 61 78 92 02 fb 65 48 8b 1d 68 51 5d 04 <48> 8d
> bb e0 0a 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1
> RSP: 0018:ffff88802d32f630 EFLAGS: 00000286
> RAX: 0000000000000000 RBX: ffff888012496900 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: ffff888012496900 RDI: ffff88806d535b80
> RBP: ffff88802d32f670 R08: 0000000000000000 R09: ffffffff817f85a5
> R10: 0000000000000000 R11: 0000000000000000 R12: ffff88806d535b80
> R13: ffff88800635c600 R14: ffff88800f630f00 R15: ffff888012497374
>  context_switch kernel/sched/core.c:5263 [inline]
>  __schedule+0x1293/0x38c0 kernel/sched/core.c:6867
>  preempt_schedule_irq+0x49/0x70 kernel/sched/core.c:7194
>  irqentry_exit+0xc1/0x5a0 kernel/entry/common.c:216
>  asm_sysvec_irq_work+0x1a/0x20 arch/x86/include/asm/idtentry.h:733
> RIP: 0010:__mod_zone_page_state+0x12/0xf0 mm/vmstat.c:347
> Code: 89 ef e8 b1 53 18 00 e9 54 ff ff ff 66 66 2e 0f 1f 84 00 00 00
> 00 00 90 f3 0f 1e fa 48 b8 00 00 00 00 00 fc ff df 41 55 41 54 <55> 48
> 89 fd 48 83 c7 70 53 48 89 f9 48 c1 e9 03 48 83 ec 10 80 3c
> RSP: 0018:ffff88802d32f898 EFLAGS: 00000286
> RAX: dffffc0000000000 RBX: ffff88800c0c4640 RCX: 0000000000000000
> RDX: 0000000000000001 RSI: 0000000000000002 RDI: ffff88807ffdcc00
> RBP: ffffea00014a5a00 R08: ffffffff846c1c01 R09: ffff88806d53b6d0
> R10: ffff888006278000 R11: ffff8880062785bb R12: 0000000000000000
> R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000001
>  __update_lru_size include/linux/mm_inline.h:48 [inline]
>  update_lru_size include/linux/mm_inline.h:56 [inline]
>  lruvec_add_folio include/linux/mm_inline.h:348 [inline]
>  lru_add+0x44f/0x890 mm/swap.c:154
>  folio_batch_move_lru+0x110/0x4c0 mm/swap.c:172
>  __folio_batch_add_and_move+0x27e/0x7e0 mm/swap.c:196
>  shmem_alloc_and_add_folio mm/shmem.c:1991 [inline]
>  shmem_get_folio_gfp.isra.0+0xc49/0x1410 mm/shmem.c:2556
>  shmem_get_folio mm/shmem.c:2662 [inline]
>  shmem_write_begin+0x197/0x3b0 mm/shmem.c:3315
>  generic_perform_write+0x37f/0x800 mm/filemap.c:4314
>  shmem_file_write_iter+0x10d/0x140 mm/shmem.c:3490
>  new_sync_write fs/read_write.c:593 [inline]
>  vfs_write fs/read_write.c:686 [inline]
>  vfs_write+0xabc/0xe90 fs/read_write.c:666
>  ksys_write+0x121/0x240 fs/read_write.c:738
>  do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
>  do_syscall_64+0xac/0x330 arch/x86/entry/syscall_64.c:94
>  entry_SYSCALL_64_after_hwframe+0x4b/0x53
> RIP: 0033:0x7f9b5abad69f
> Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 f9 92 02 00 48 8b 54
> 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d
> 00 f0 ff ff 77 31 44 89 c7 48 89 44 24 08 e8 4c 93 02 00 48
> RSP: 002b:00007f9b595eddf0 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
> RAX: ffffffffffffffda RBX: 0000000000010000 RCX: 00007f9b5abad69f
> RDX: 0000000000010000 RSI: 00007f9b511ce000 RDI: 0000000000000007
> RBP: 0000000000000000 R08: 0000000000000000 R09: 00000000000002f2
> R10: 00000000000001ce R11: 0000000000000293 R12: 0000000000000007
> R13: 00007f9b595edef0 R14: 00007f9b595edeb0 R15: 00007f9b511ce000
>  </TASK>
>
Hi Zw,

__mod_zone_page_state() itself doesn't appear to block, so the reported
frame is
likely just where the task was sampled.

Given the task stays in running state, this looks more like a long CPU-bound
section rather than a blocking issue. With heavy shmem writes, we may be
spending significant time in the folio allocation/LRU paths (for example
folio_batch_move_lru() or alloc_pages() slowpath with reclaim/compaction),
which can run for quite a while without hitting a reschedule point and thus
starve RCU.

Could you try enabling lockdep as David suggested? It would also help to
collect
some tracing or perf data around the allocation/LRU paths to see where
the CPU
time is actually spent.


Thanx, Kunwu



^ permalink raw reply	[flat|nested] 5+ messages in thread

[parent not found: <PS1PPF7E1D7501F6FE46DF307FFB8086E2FAB66A@PS1PPF7E1D7501F.apcprd02.prod.outlook.com>]

* 回复: [BUG] rcu detected stall in shmem_file_write_iter
       [not found]   ` <PS1PPF7E1D7501F6FE46DF307FFB8086E2FAB66A@PS1PPF7E1D7501F.apcprd02.prod.outlook.com>
@ 2026-02-06 12:16     ` Zw Tang
  2026-02-09  7:34       ` Kunwu Chan
  0 siblings, 1 reply; 5+ messages in thread
From: Zw Tang @ 2026-02-06 12:16 UTC (permalink / raw)
  To: Kunwu Chan, linux-mm, rcu
  Cc: hughd, akpm, david, chrisl, kasong, paulmck, frederic, linux-kernel

Hi David, hi Kunwu,
thanks a lot for the suggestions.
I reran the reproducer with
CONFIG_PROVE_LOCKING=y, CONFIG_LOCKDEP=y, CONFIG_DEBUG_LOCK_ALLOC=y enabled.
Based on the lockdep-enabled run, here is what I can clarify:

Lockdep does not report any lock inversion, recursive locking, or
circular dependency.
The samples showing __mod_zone_page_state() do not appear to indicate
a blocking point; this frame indeed seems to be just where the task
was sampled.

From the timeline of the reports, the earliest problematic behavior
appears before the MM/LRU-heavy paths.
In the first hung-task report, multiple repro1 threads are already blocked in:

down_write()
 └─ rwbase_write_lock()
    └─ __rt_mutex_slowlock_locked()
       └─ rt_mutex_schedule()

via the do_vfs_ioctl() → perf_fasync() path, and are in D state for
more than 143 seconds at that point.
After several threads are stuck there, the system degrades further:
other threads remain in R state, spending long, uninterrupted time in
MM allocation / LRU paths
(alloc_pages(), get_page_from_freelist(), __handle_mm_fault()),
without hitting reschedule points.
This then leads to RCU preempt stalls, and eventually workqueue lockups
(e.g. vmstat_shepherd, do_cache_clean, wb_workfn).
Lockdep’s “show all locks held” output does not show the blocked repro1
threads holding any MM/LRU/zone locks themselves; they typically only hold
the filesystem mutex at that point, which suggests the contended RT rwsem
is held elsewhere.
Overall, this currently looks less like a single blocking bug in
__mod_zone_page_state(), and more like a PREEMPT_RT-specific
starvation scenario,
where long-held RT rwsems in the ioctl/perf path combined with long CPU-bound
MM/LRU execution amplify into RCU starvation and workqueue lockups.
Below is the earliest hung-task report from the lockdep-enabled run
for reference:


[386.499937] INFO: task repro1:2066 blocked for more than 143 seconds.
[386.499956] Not tainted 6.19.0-rc7 #4
[386.499964] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[386.499970] task:repro1 state:D stack:28400 pid:2066 tgid:2066 ppid:293 2
[386.500022] Call Trace:
[386.500027] <TASK>
[386.500037] __schedule+0x1198/0x3f00
[386.500069]  ? io_schedule_timeout+0x80/0x80
[386.500088]  ? kvm_sched_clock_read+0x16/0x20
[386.500111]  ? local_clock_noinstr+0xf/0xc0
[386.500125]  ? __rt_mutex_slowlock_locked.constprop.0+0xecd/0x30c0
[386.500148] rt_mutex_schedule+0x9f/0xe0
[386.500171] __rt_mutex_slowlock_locked.constprop.0+0xedc/0x30c0
[386.500197]  ? down_write_trylock+0x1a0/0x1a0
[386.500222]  ? lock_acquired+0xbd/0x340
[386.500245] rwbase_write_lock+0x744/0xa80
[386.500266]  ? perf_fasync+0xc0/0x130
[386.500284]  ? rt_mutex_adjust_prio_chain.isra.0+0x3240/0x3240
[386.500304]  ? kvm_sched_clock_read+0x16/0x20
[386.500329]  ? perf_fasync+0xc0/0x130
[386.500344]  ? local_clock+0x10/0x20
[386.500364]  ? lock_contended+0x189/0x420
[386.500385] down_write+0x6e/0x1e0
[386.500405] perf_fasync+0xc0/0x130
[386.500421]  ? perf_cgroup_css_free+0x50/0x50
[386.500440] do_vfs_ioctl+0x9b9/0x1480
[386.500457]  ? lock_vma_under_rcu+0x7ee/0xd90
[386.500475]  ? ioctl_file_clone+0xf0/0xf0
[386.500490]  ? lock_is_held_type+0xa0/0x110
[386.500506]  ? handle_mm_fault+0x5a6/0x9d0
[386.500526]  ? kvm_sched_clock_read+0x16/0x20
[386.502053]  ? local_clock_noinstr+0xf/0xc0
[386.502073]  ? handle_mm_fault+0x5a6/0x9d0
[386.502092]  ? exc_page_fault+0xb0/0x180
[386.502106]  ? kvm_sched_clock_read+0x16/0x20
[386.502129]  ? local_clock_noinstr+0xf/0xc0
[386.502142]  ? exc_page_fault+0xb0/0x180
[386.502154]  ? local_clock+0x10/0x20
[386.502174]  ? lock_release+0x258/0x3c0
[386.502196]  ? irqentry_exit+0xf0/0x6d0
[386.502213] __x64_sys_ioctl+0x112/0x220
[386.502232] do_syscall_64+0xc3/0x430
[386.502253] entry_SYSCALL_64_after_hwframe+0x4b/0x53
[386.502269] RIP: 0033:0x7f62f7922fc9
[386.502351] </TASK>
[386.502357] INFO: task repro1:2072 blocked for more than 143 seconds.
[386.502366] Not tainted 6.19.0-rc7 #4
[386.502373] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[386.502378] task:repro1 state:D stack:28400 pid:2072 tgid:2072 ppid:294 2
[386.502427] Call Trace:
[386.502431] <TASK>
[386.502439] __schedule+0x1198/0x3f00
[386.502463]  ? io_schedule_timeout+0x80/0x80
[386.502483]  ? mark_held_locks+0x50/0x80
[386.502505] rt_mutex_schedule+0x9f/0xe0
[386.502527] __rt_mutex_slowlock_locked.constprop.0+0xedc/0x30c0
[386.503218]  ? down_write_trylock+0x1a0/0x1a0
[386.503246]  ? lock_acquired+0xbd/0x340
[386.503269] rwbase_write_lock+0x744/0xa80
[386.503290]  ? perf_fasync+0xc0/0x130
[386.503306]  ? rt_mutex_adjust_prio_chain.isra.0+0x3240/0x3240
[386.503327]  ? kvm_sched_clock_read+0x16/0x20
[386.503351]  ? perf_fasync+0xc0/0x130
[386.503366]  ? local_clock+0x10/0x20
[386.503386]  ? lock_contended+0x189/0x420
[386.503407] down_write+0x6e/0x1e0
[386.503427] perf_fasync+0xc0/0x130
[386.503442]  ? perf_cgroup_css_free+0x50/0x50
[386.503461] do_vfs_ioctl+0x9b9/0x1480
[386.503476]  ? lock_vma_under_rcu+0x7ee/0xd90
[386.503493]  ? ioctl_file_clone+0xf0/0xf0
[386.503508]  ? lock_is_held_type+0xa0/0x110
[386.503524]  ? handle_mm_fault+0x5a6/0x9d0
[386.503543]  ? kvm_sched_clock_read+0x16/0x20
[386.504012]  ? local_clock_noinstr+0xf/0xc0
[386.504049]  ? exc_page_fault+0xb0/0x180
[386.504312]  ? irqentry_exit+0xf0/0x6d0
[386.504330] __x64_sys_ioctl+0x112/0x220
[386.504369] entry_SYSCALL_64_after_hwframe+0x4b/0x53
[386.504464] </TASK>[386.504470] INFO: task repro1:2073 blocked for
more than 143 seconds.
[386.504491] task:repro1 state:D stack:28400 pid:2073 tgid:2073 ppid:292 2
[386.504540] Call Trace:
[386.504544] <TASK>
[386.505300] __schedule+0x1198/0x3f00
[386.505347]  ? mark_held_locks+0x50/0x80
[386.505369] rt_mutex_schedule+0x9f/0xe0
[386.505391] __rt_mutex_slowlock_locked.constprop.0+0xedc/0x30c0
[386.505464] rwbase_write_lock+0x744/0xa80
[386.505988] down_write+0x6e/0x1e0
[386.506042] do_vfs_ioctl+0x9b9/0x1480
[386.506301] __x64_sys_ioctl+0x112/0x220
[386.506340] entry_SYSCALL_64_after_hwframe+0x4b/0x53
[386.506434] </TASK>
[386.506442] Showing all locks held in the system:
[386.506447] 4 locks held by pr/legacy/16:
[386.506456] 1 lock held by khungtaskd/37:
[386.506464] #0: ffffffff85041540 (rcu_read_lock){....}-{1:3}
[386.506503] 1 lock held by in:imklog/196:
[386.506513] 1 lock held by repro1/2040:
[386.506522] 1 lock held by repro1/2066:
[386.506532] #0: ffff88800784bc50 (&sb->s_type->i_mutex_key#17)
[386.507276] 1 lock held by repro1/2072:
[386.507284] #0: ffff88800784bc50 (&sb->s_type->i_mutex_key#17)
[386.507321] 1 lock held by repro1/2073:
[386.507328] #0: ffff88800784bc50 (&sb->s_type->i_mutex_key#17)
[427.459692] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0
nice=0 stuck for 40s!
[427.459779] workqueue events:
[427.459809]   pending: vmstat_shepherd, e1000_watchdog
[427.460020] workqueue events_freezable_pwr_efficient:
[427.460020]   in-flight: disk_events_workfn
[427.460052] workqueue writeback:
[427.460084]   in-flight: wb_workfn
[427.460231] Showing backtraces of running workers in stalled
CPU-bound worker pools
Message from syslogd@syzkaller at Feb 6 10:27:59 ... kernel:[
427.459692] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0
nice=0 stuc!

Thanks
Zw Tang


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 回复: [BUG] rcu detected stall in shmem_file_write_iter
  2026-02-06 12:16     ` 回复: " Zw Tang
@ 2026-02-09  7:34       ` Kunwu Chan
  0 siblings, 0 replies; 5+ messages in thread
From: Kunwu Chan @ 2026-02-09  7:34 UTC (permalink / raw)
  To: Zw Tang, linux-mm, rcu
  Cc: hughd, akpm, david, chrisl, kasong, paulmck, frederic, linux-kernel

On 2/6/26 20:16, Zw Tang wrote:
> Hi David, hi Kunwu,
> thanks a lot for the suggestions.
> I reran the reproducer with
> CONFIG_PROVE_LOCKING=y, CONFIG_LOCKDEP=y, CONFIG_DEBUG_LOCK_ALLOC=y enabled.
> Based on the lockdep-enabled run, here is what I can clarify:
>
> Lockdep does not report any lock inversion, recursive locking, or
> circular dependency.
> The samples showing __mod_zone_page_state() do not appear to indicate
> a blocking point; this frame indeed seems to be just where the task
> was sampled.
Thanks for the lockdep-enabled rerun. Agreed that
__mod_zone_page_state() is most likely just a sampling point.

> From the timeline of the reports, the earliest problematic behavior
> appears before the MM/LRU-heavy paths.
> In the first hung-task report, multiple repro1 threads are already blocked in:
>
> down_write()
>  └─ rwbase_write_lock()
>     └─ __rt_mutex_slowlock_locked()
>        └─ rt_mutex_schedule()
>
> via the do_vfs_ioctl() → perf_fasync() path, and are in D state for
> more than 143 seconds at that point.
> After several threads are stuck there, the system degrades further:
> other threads remain in R state, spending long, uninterrupted time in
> MM allocation / LRU paths
> (alloc_pages(), get_page_from_freelist(), __handle_mm_fault()),
> without hitting reschedule points.
> This then leads to RCU preempt stalls, and eventually workqueue lockups
> (e.g. vmstat_shepherd, do_cache_clean, wb_workfn).
> Lockdep’s “show all locks held” output does not show the blocked repro1
> threads holding any MM/LRU/zone locks themselves; they typically only hold
> the filesystem mutex at that point, which suggests the contended RT rwsem
> is held elsewhere.
> Overall, this currently looks less like a single blocking bug in
> __mod_zone_page_state(), and more like a PREEMPT_RT-specific
> starvation scenario,
> where long-held RT rwsems in the ioctl/perf path combined with long CPU-bound
> MM/LRU execution amplify into RCU starvation and workqueue lockups.
> Below is the earliest hung-task report from the lockdep-enabled run
> for reference:
>
>
> [386.499937] INFO: task repro1:2066 blocked for more than 143 seconds.
> [386.499956] Not tainted 6.19.0-rc7 #4
> [386.499964] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [386.499970] task:repro1 state:D stack:28400 pid:2066 tgid:2066 ppid:293 2
The earliest hung tasks are blocked in perf_fasync() at inode_lock()
(down_write(&inode->i_rwsem)), 
which indicates heavy inode rwsem contention. 
However, the waiter stacks alone don’t identify the lock holder.

To move this forward, could you capture a SysRq-T (and optionally
SysRq-w) at the time of the hang 
so we can inspect the system state and help identify the lock holder/CPU
hog, 
plus any PREEMPT_RT PI/owner-chain information for the underlying
rt_mutex/rwsem (if available)?

Thanx, Kunwu

> [386.500022] Call Trace:
> [386.500027] <TASK>
> [386.500037] __schedule+0x1198/0x3f00
> [386.500069]  ? io_schedule_timeout+0x80/0x80
> [386.500088]  ? kvm_sched_clock_read+0x16/0x20
> [386.500111]  ? local_clock_noinstr+0xf/0xc0
> [386.500125]  ? __rt_mutex_slowlock_locked.constprop.0+0xecd/0x30c0
> [386.500148] rt_mutex_schedule+0x9f/0xe0
> [386.500171] __rt_mutex_slowlock_locked.constprop.0+0xedc/0x30c0
> [386.500197]  ? down_write_trylock+0x1a0/0x1a0
> [386.500222]  ? lock_acquired+0xbd/0x340
> [386.500245] rwbase_write_lock+0x744/0xa80
> [386.500266]  ? perf_fasync+0xc0/0x130
> [386.500284]  ? rt_mutex_adjust_prio_chain.isra.0+0x3240/0x3240
> [386.500304]  ? kvm_sched_clock_read+0x16/0x20
> [386.500329]  ? perf_fasync+0xc0/0x130
> [386.500344]  ? local_clock+0x10/0x20
> [386.500364]  ? lock_contended+0x189/0x420
> [386.500385] down_write+0x6e/0x1e0
> [386.500405] perf_fasync+0xc0/0x130
> [386.500421]  ? perf_cgroup_css_free+0x50/0x50
> [386.500440] do_vfs_ioctl+0x9b9/0x1480
> [386.500457]  ? lock_vma_under_rcu+0x7ee/0xd90
> [386.500475]  ? ioctl_file_clone+0xf0/0xf0
> [386.500490]  ? lock_is_held_type+0xa0/0x110
> [386.500506]  ? handle_mm_fault+0x5a6/0x9d0
> [386.500526]  ? kvm_sched_clock_read+0x16/0x20
> [386.502053]  ? local_clock_noinstr+0xf/0xc0
> [386.502073]  ? handle_mm_fault+0x5a6/0x9d0
> [386.502092]  ? exc_page_fault+0xb0/0x180
> [386.502106]  ? kvm_sched_clock_read+0x16/0x20
> [386.502129]  ? local_clock_noinstr+0xf/0xc0
> [386.502142]  ? exc_page_fault+0xb0/0x180
> [386.502154]  ? local_clock+0x10/0x20
> [386.502174]  ? lock_release+0x258/0x3c0
> [386.502196]  ? irqentry_exit+0xf0/0x6d0
> [386.502213] __x64_sys_ioctl+0x112/0x220
> [386.502232] do_syscall_64+0xc3/0x430
> [386.502253] entry_SYSCALL_64_after_hwframe+0x4b/0x53
> [386.502269] RIP: 0033:0x7f62f7922fc9
> [386.502351] </TASK>
> [386.502357] INFO: task repro1:2072 blocked for more than 143 seconds.
> [386.502366] Not tainted 6.19.0-rc7 #4
> [386.502373] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [386.502378] task:repro1 state:D stack:28400 pid:2072 tgid:2072 ppid:294 2
> [386.502427] Call Trace:
> [386.502431] <TASK>
> [386.502439] __schedule+0x1198/0x3f00
> [386.502463]  ? io_schedule_timeout+0x80/0x80
> [386.502483]  ? mark_held_locks+0x50/0x80
> [386.502505] rt_mutex_schedule+0x9f/0xe0
> [386.502527] __rt_mutex_slowlock_locked.constprop.0+0xedc/0x30c0
> [386.503218]  ? down_write_trylock+0x1a0/0x1a0
> [386.503246]  ? lock_acquired+0xbd/0x340
> [386.503269] rwbase_write_lock+0x744/0xa80
> [386.503290]  ? perf_fasync+0xc0/0x130
> [386.503306]  ? rt_mutex_adjust_prio_chain.isra.0+0x3240/0x3240
> [386.503327]  ? kvm_sched_clock_read+0x16/0x20
> [386.503351]  ? perf_fasync+0xc0/0x130
> [386.503366]  ? local_clock+0x10/0x20
> [386.503386]  ? lock_contended+0x189/0x420
> [386.503407] down_write+0x6e/0x1e0
> [386.503427] perf_fasync+0xc0/0x130
> [386.503442]  ? perf_cgroup_css_free+0x50/0x50
> [386.503461] do_vfs_ioctl+0x9b9/0x1480
> [386.503476]  ? lock_vma_under_rcu+0x7ee/0xd90
> [386.503493]  ? ioctl_file_clone+0xf0/0xf0
> [386.503508]  ? lock_is_held_type+0xa0/0x110
> [386.503524]  ? handle_mm_fault+0x5a6/0x9d0
> [386.503543]  ? kvm_sched_clock_read+0x16/0x20
> [386.504012]  ? local_clock_noinstr+0xf/0xc0
> [386.504049]  ? exc_page_fault+0xb0/0x180
> [386.504312]  ? irqentry_exit+0xf0/0x6d0
> [386.504330] __x64_sys_ioctl+0x112/0x220
> [386.504369] entry_SYSCALL_64_after_hwframe+0x4b/0x53
> [386.504464] </TASK>[386.504470] INFO: task repro1:2073 blocked for
> more than 143 seconds.
> [386.504491] task:repro1 state:D stack:28400 pid:2073 tgid:2073 ppid:292 2
> [386.504540] Call Trace:
> [386.504544] <TASK>
> [386.505300] __schedule+0x1198/0x3f00
> [386.505347]  ? mark_held_locks+0x50/0x80
> [386.505369] rt_mutex_schedule+0x9f/0xe0
> [386.505391] __rt_mutex_slowlock_locked.constprop.0+0xedc/0x30c0
> [386.505464] rwbase_write_lock+0x744/0xa80
> [386.505988] down_write+0x6e/0x1e0
> [386.506042] do_vfs_ioctl+0x9b9/0x1480
> [386.506301] __x64_sys_ioctl+0x112/0x220
> [386.506340] entry_SYSCALL_64_after_hwframe+0x4b/0x53
> [386.506434] </TASK>
> [386.506442] Showing all locks held in the system:
> [386.506447] 4 locks held by pr/legacy/16:
> [386.506456] 1 lock held by khungtaskd/37:
> [386.506464] #0: ffffffff85041540 (rcu_read_lock){....}-{1:3}
> [386.506503] 1 lock held by in:imklog/196:
> [386.506513] 1 lock held by repro1/2040:
> [386.506522] 1 lock held by repro1/2066:
> [386.506532] #0: ffff88800784bc50 (&sb->s_type->i_mutex_key#17)
> [386.507276] 1 lock held by repro1/2072:
> [386.507284] #0: ffff88800784bc50 (&sb->s_type->i_mutex_key#17)
> [386.507321] 1 lock held by repro1/2073:
> [386.507328] #0: ffff88800784bc50 (&sb->s_type->i_mutex_key#17)
> [427.459692] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0
> nice=0 stuck for 40s!
> [427.459779] workqueue events:
> [427.459809]   pending: vmstat_shepherd, e1000_watchdog
> [427.460020] workqueue events_freezable_pwr_efficient:
> [427.460020]   in-flight: disk_events_workfn
> [427.460052] workqueue writeback:
> [427.460084]   in-flight: wb_workfn
> [427.460231] Showing backtraces of running workers in stalled
> CPU-bound worker pools
> Message from syslogd@syzkaller at Feb 6 10:27:59 ... kernel:[
> 427.459692] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0
> nice=0 stuc!
>
> Thanks
> Zw Tang



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-02-09  7:35 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-02-05 12:57 [BUG] rcu detected stall in shmem_file_write_iter Zw Tang
2026-02-05 13:44 ` David Hildenbrand (Arm)
2026-02-06  7:37 ` Kunwu Chan
     [not found]   ` <PS1PPF7E1D7501F6FE46DF307FFB8086E2FAB66A@PS1PPF7E1D7501F.apcprd02.prod.outlook.com>
2026-02-06 12:16     ` 回复: " Zw Tang
2026-02-09  7:34       ` Kunwu Chan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox