[PATCH v2] mm/vmalloc: prevent RCU stalls in kasan_release_vmalloc

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2] mm/vmalloc: prevent RCU stalls in kasan_release_vmalloc_node
@ 2026-01-12 10:36 Deepanshu Kartikey
  2026-01-12 11:09 ` Uladzislau Rezki
  2026-01-12 12:08 ` Hillf Danton
  0 siblings, 2 replies; 7+ messages in thread
From: Deepanshu Kartikey @ 2026-01-12 10:36 UTC (permalink / raw)
  To: akpm, urezki
  Cc: linux-mm, linux-kernel, Deepanshu Kartikey, syzbot+d8d4c31d40f868eaea30

When CONFIG_PAGE_OWNER is enabled, freeing KASAN shadow pages during
vmalloc cleanup triggers expensive stack unwinding that acquires RCU
read locks. Processing a large purge_list without rescheduling can
cause the task to hold CPU for extended periods (10+ seconds), leading
to RCU stalls and potential OOM conditions.

The issue manifests in purge_vmap_node() -> kasan_release_vmalloc_node()
where iterating through hundreds or thousands of vmap_area entries and
freeing their associated shadow pages causes:

  rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
  rcu: Tasks blocked on level-0 rcu_node (CPUs 0-1): P6229/1:b..l
  ...
  task:kworker/0:17 state:R running task stack:28840 pid:6229
  ...
  kasan_release_vmalloc_node+0x1ba/0xad0 mm/vmalloc.c:2299
  purge_vmap_node+0x1ba/0xad0 mm/vmalloc.c:2299

Each call to kasan_release_vmalloc() can free many pages, and with
page_owner tracking, each free triggers save_stack() which performs
stack unwinding under RCU read lock. Without yielding, this creates
an unbounded RCU critical section.

Add periodic cond_resched() calls within the loop to allow:
- RCU grace periods to complete
- Other tasks to run
- Scheduler to preempt when needed

The fix uses need_resched() for immediate response under load, with
a batch count of 32 as a guaranteed upper bound to prevent worst-case
stalls even under light load.

Reported-by: syzbot+d8d4c31d40f868eaea30@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=d8d4c31d40f868eaea30
Link: https://lore.kernel.org/all/20260112084723.622910-1-kartikey406@gmail.com/T/ [v1]
Suggested-by: Uladzislau Rezki <urezki@gmail.com>
Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com>
---
v2: Use a macro for batch size (suggested by Uladzislau Rezki)
---
 mm/vmalloc.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 41dd01e8430c..51e58701565d 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2268,11 +2268,14 @@ decay_va_pool_node(struct vmap_node *vn, bool full_decay)
 	reclaim_list_global(&decay_list);
 }

+#define KASAN_RELEASE_BATCH_SIZE 32
+
 static void
 kasan_release_vmalloc_node(struct vmap_node *vn)
 {
 	struct vmap_area *va;
 	unsigned long start, end;
+	unsigned int batch_count = 0;

 	start = list_first_entry(&vn->purge_list, struct vmap_area, list)->va_start;
 	end = list_last_entry(&vn->purge_list, struct vmap_area, list)->va_end;
@@ -2282,6 +2285,11 @@ kasan_release_vmalloc_node(struct vmap_node *vn)
 			kasan_release_vmalloc(va->va_start, va->va_end,
 				va->va_start, va->va_end,
 				KASAN_VMALLOC_PAGE_RANGE);
+
+		if (need_resched() || (++batch_count >= KASAN_RELEASE_BATCH_SIZE)) {
+			cond_resched();
+			batch_count = 0;
+		}
 	}

 	kasan_release_vmalloc(start, end, start, end, KASAN_VMALLOC_TLB_FLUSH);
-- 
2.43.0

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] mm/vmalloc: prevent RCU stalls in kasan_release_vmalloc_node
  2026-01-12 10:36 [PATCH v2] mm/vmalloc: prevent RCU stalls in kasan_release_vmalloc_node Deepanshu Kartikey
@ 2026-01-12 11:09 ` Uladzislau Rezki
  2026-01-12 12:08 ` Hillf Danton
  1 sibling, 0 replies; 7+ messages in thread
From: Uladzislau Rezki @ 2026-01-12 11:09 UTC (permalink / raw)
  To: Deepanshu Kartikey
  Cc: akpm, urezki, linux-mm, linux-kernel, syzbot+d8d4c31d40f868eaea30

On Mon, Jan 12, 2026 at 04:06:12PM +0530, Deepanshu Kartikey wrote:
> When CONFIG_PAGE_OWNER is enabled, freeing KASAN shadow pages during
> vmalloc cleanup triggers expensive stack unwinding that acquires RCU
> read locks. Processing a large purge_list without rescheduling can
> cause the task to hold CPU for extended periods (10+ seconds), leading
> to RCU stalls and potential OOM conditions.
> 
> The issue manifests in purge_vmap_node() -> kasan_release_vmalloc_node()
> where iterating through hundreds or thousands of vmap_area entries and
> freeing their associated shadow pages causes:
> 
>   rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
>   rcu: Tasks blocked on level-0 rcu_node (CPUs 0-1): P6229/1:b..l
>   ...
>   task:kworker/0:17 state:R running task stack:28840 pid:6229
>   ...
>   kasan_release_vmalloc_node+0x1ba/0xad0 mm/vmalloc.c:2299
>   purge_vmap_node+0x1ba/0xad0 mm/vmalloc.c:2299
> 
> Each call to kasan_release_vmalloc() can free many pages, and with
> page_owner tracking, each free triggers save_stack() which performs
> stack unwinding under RCU read lock. Without yielding, this creates
> an unbounded RCU critical section.
> 
> Add periodic cond_resched() calls within the loop to allow:
> - RCU grace periods to complete
> - Other tasks to run
> - Scheduler to preempt when needed
> 
> The fix uses need_resched() for immediate response under load, with
> a batch count of 32 as a guaranteed upper bound to prevent worst-case
> stalls even under light load.
> 
> Reported-by: syzbot+d8d4c31d40f868eaea30@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=d8d4c31d40f868eaea30
> Link: https://lore.kernel.org/all/20260112084723.622910-1-kartikey406@gmail.com/T/ [v1]
> Suggested-by: Uladzislau Rezki <urezki@gmail.com>
> Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com>
> ---
> v2: Use a macro for batch size (suggested by Uladzislau Rezki)
> ---
>  mm/vmalloc.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 41dd01e8430c..51e58701565d 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -2268,11 +2268,14 @@ decay_va_pool_node(struct vmap_node *vn, bool full_decay)
>  	reclaim_list_global(&decay_list);
>  }
>  
> +#define KASAN_RELEASE_BATCH_SIZE 32
> +
>  static void
>  kasan_release_vmalloc_node(struct vmap_node *vn)
>  {
>  	struct vmap_area *va;
>  	unsigned long start, end;
> +	unsigned int batch_count = 0;
>  
>  	start = list_first_entry(&vn->purge_list, struct vmap_area, list)->va_start;
>  	end = list_last_entry(&vn->purge_list, struct vmap_area, list)->va_end;
> @@ -2282,6 +2285,11 @@ kasan_release_vmalloc_node(struct vmap_node *vn)
>  			kasan_release_vmalloc(va->va_start, va->va_end,
>  				va->va_start, va->va_end,
>  				KASAN_VMALLOC_PAGE_RANGE);
> +
> +		if (need_resched() || (++batch_count >= KASAN_RELEASE_BATCH_SIZE)) {
> +			cond_resched();
> +			batch_count = 0;
> +		}
>  	}
>  
>  	kasan_release_vmalloc(start, end, start, end, KASAN_VMALLOC_TLB_FLUSH);
> -- 
> 2.43.0
> 
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>

--
Uladzislau Rezki


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] mm/vmalloc: prevent RCU stalls in kasan_release_vmalloc_node
  2026-01-12 10:36 [PATCH v2] mm/vmalloc: prevent RCU stalls in kasan_release_vmalloc_node Deepanshu Kartikey
  2026-01-12 11:09 ` Uladzislau Rezki
@ 2026-01-12 12:08 ` Hillf Danton
  2026-01-12 13:13   ` [syzbot] [mm?] INFO: rcu detected stall in purge_vmap_node syzbot
  1 sibling, 1 reply; 7+ messages in thread
From: Hillf Danton @ 2026-01-12 12:08 UTC (permalink / raw)
  To: Deepanshu Kartikey
  Cc: Uladzislau Rezki, akpm, linux-mm, linux-kernel, syzkaller-bugs,
	syzbot+d8d4c31d40f868eaea30

#syz test

When CONFIG_PAGE_OWNER is enabled, freeing KASAN shadow pages during
vmalloc cleanup triggers expensive stack unwinding that acquires RCU
read locks. Processing a large purge_list without rescheduling can
cause the task to hold CPU for extended periods (10+ seconds), leading
to RCU stalls and potential OOM conditions.

The issue manifests in purge_vmap_node() -> kasan_release_vmalloc_node()
where iterating through hundreds or thousands of vmap_area entries and
freeing their associated shadow pages causes:

  rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
  rcu: Tasks blocked on level-0 rcu_node (CPUs 0-1): P6229/1:b..l
  ...
  task:kworker/0:17 state:R running task stack:28840 pid:6229
  ...
  kasan_release_vmalloc_node+0x1ba/0xad0 mm/vmalloc.c:2299
  purge_vmap_node+0x1ba/0xad0 mm/vmalloc.c:2299

Each call to kasan_release_vmalloc() can free many pages, and with
page_owner tracking, each free triggers save_stack() which performs
stack unwinding under RCU read lock. Without yielding, this creates
an unbounded RCU critical section.

Add periodic cond_resched() calls within the loop to allow:
- RCU grace periods to complete
- Other tasks to run
- Scheduler to preempt when needed

The fix uses need_resched() for immediate response under load, with
a batch count of 32 as a guaranteed upper bound to prevent worst-case
stalls even under light load.

Reported-by: syzbot+d8d4c31d40f868eaea30@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=d8d4c31d40f868eaea30
Link: https://lore.kernel.org/all/20260112084723.622910-1-kartikey406@gmail.com/T/ [v1]
Suggested-by: Uladzislau Rezki <urezki@gmail.com>
Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com>
---
v2: Use a macro for batch size (suggested by Uladzislau Rezki)
---
 mm/vmalloc.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 41dd01e8430c..51e58701565d 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2268,11 +2268,14 @@ decay_va_pool_node(struct vmap_node *vn, bool full_decay)
 	reclaim_list_global(&decay_list);
 }
 
+#define KASAN_RELEASE_BATCH_SIZE 32
+
 static void
 kasan_release_vmalloc_node(struct vmap_node *vn)
 {
 	struct vmap_area *va;
 	unsigned long start, end;
+	unsigned int batch_count = 0;
 
 	start = list_first_entry(&vn->purge_list, struct vmap_area, list)->va_start;
 	end = list_last_entry(&vn->purge_list, struct vmap_area, list)->va_end;
@@ -2282,6 +2285,11 @@ kasan_release_vmalloc_node(struct vmap_node *vn)
 			kasan_release_vmalloc(va->va_start, va->va_end,
 				va->va_start, va->va_end,
 				KASAN_VMALLOC_PAGE_RANGE);
+
+		if (need_resched() || (++batch_count >= KASAN_RELEASE_BATCH_SIZE)) {
+			cond_resched();
+			batch_count = 0;
+		}
 	}
 
 	kasan_release_vmalloc(start, end, start, end, KASAN_VMALLOC_TLB_FLUSH);
-- 
2.43.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [syzbot] [mm?] INFO: rcu detected stall in purge_vmap_node
  2026-01-12 12:08 ` Hillf Danton
@ 2026-01-12 13:13   ` syzbot
  2026-01-12 13:38     ` Hillf Danton
  0 siblings, 1 reply; 7+ messages in thread
From: syzbot @ 2026-01-12 13:13 UTC (permalink / raw)
  To: akpm, hdanton, kartikey406, linux-kernel, linux-mm,
	syzkaller-bugs, urezki

Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
INFO: rcu detected stall in x64_sys_call

rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
rcu: 	Tasks blocked on level-0 rcu_node (CPUs 0-1): P6717/1:b..l P5934/1:b..l
rcu: 	(detected by 1, t=10502 jiffies, g=15885, q=682 ncpus=2)
task:kworker/1:4     state:R  running task     stack:27080 pid:5934  tgid:5934  ppid:2      task_flags:0x4208060 flags:0x00080000
Workqueue: events cleanup_vm_area_work
Call Trace:
 <TASK>
 context_switch kernel/sched/core.c:5256 [inline]
 __schedule+0x1139/0x6150 kernel/sched/core.c:6863
 preempt_schedule_irq+0x51/0x90 kernel/sched/core.c:7190
 irqentry_exit+0x1d8/0x8c0 kernel/entry/common.c:216
 asm_sysvec_reschedule_ipi+0x1a/0x20 arch/x86/include/asm/idtentry.h:702
RIP: 0010:unwind_next_frame+0x14e/0x20b0 arch/x86/kernel/unwind_orc.c:510
Code: c0 74 08 3c 01 0f 8e 20 0a 00 00 41 f6 86 88 00 00 00 03 0f 85 71 02 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8d 75 35 4c 89 f2 <48> c1 ea 03 0f b6 04 02 4c 89 f2 83 e2 07 38 d0 7f 08 84 c0 0f 85
RSP: 0018:ffffc900033477a8 EFLAGS: 00000246
RAX: dffffc0000000000 RBX: 0000000000000001 RCX: 00000000c5fffc65
RDX: ffffc9000334784d RSI: ffffffff8bf2b380 RDI: ffffffff8dd7cf28
RBP: ffffc90003347860 R08: 0000000076218ff6 R09: 00000000676218ff
R10: 0000000000000002 R11: ffff888025fb54b0 R12: ffffc90003347868
R13: ffffc90003347818 R14: ffffc9000334784d R15: ffff888025fb4980
 arch_stack_walk+0x94/0x100 arch/x86/kernel/stacktrace.c:25
 stack_trace_save+0x8e/0xc0 kernel/stacktrace.c:122
 save_stack+0x160/0x1f0 mm/page_owner.c:165
 __reset_page_owner+0x84/0x1a0 mm/page_owner.c:320
 reset_page_owner include/linux/page_owner.h:25 [inline]
 free_pages_prepare mm/page_alloc.c:1406 [inline]
 __free_frozen_pages+0x7df/0x1170 mm/page_alloc.c:2943
 vfree+0x1fd/0xb50 mm/vmalloc.c:3474
 cleanup_vm_area_work+0x4c/0x100 mm/vmalloc.c:3771
 process_one_work+0x9ba/0x1b20 kernel/workqueue.c:3257
 process_scheduled_works kernel/workqueue.c:3340 [inline]
 worker_thread+0x6c8/0xf10 kernel/workqueue.c:3421
 kthread+0x3c5/0x780 kernel/kthread.c:463
 ret_from_fork+0x983/0xb10 arch/x86/kernel/process.c:158
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:246
 </TASK>
task:cmp             state:R  running task     stack:25800 pid:6717  tgid:6717  ppid:6660   task_flags:0x40000c flags:0x00080000
Call Trace:
 <TASK>
 context_switch kernel/sched/core.c:5256 [inline]
 __schedule+0x1139/0x6150 kernel/sched/core.c:6863
 preempt_schedule_irq+0x51/0x90 kernel/sched/core.c:7190
 irqentry_exit+0x1d8/0x8c0 kernel/entry/common.c:216
 asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:697
RIP: 0010:rcu_lock_acquire include/linux/rcupdate.h:331 [inline]
RIP: 0010:rcu_read_lock include/linux/rcupdate.h:867 [inline]
RIP: 0010:class_rcu_constructor include/linux/rcupdate.h:1195 [inline]
RIP: 0010:unwind_next_frame+0xbe/0x20b0 arch/x86/kernel/unwind_orc.c:495
Code: ea 03 80 3c 02 00 0f 85 5d 18 00 00 49 8b 45 38 48 89 44 24 10 e8 c2 47 36 00 31 d2 45 31 c9 45 31 c0 48 8d 05 00 00 00 00 50 <b9> 02 00 00 00 31 f6 48 c7 c7 a0 96 3c 8e e8 1f 45 2d 00 e8 da 80
RSP: 0018:ffffc900032e75b8 EFLAGS: 00000246
RAX: ffffffff816cb66d RBX: 0000000000000001 RCX: ffffc900032e7584
RDX: 0000000000000000 RSI: ffffffff817b2e92 RDI: ffff8880338ea944
RBP: ffffc900032e7678 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000002 R11: 0000000000007b2b R12: ffffffff81a96170
R13: ffffc900032e7630 R14: 0000000000000000 R15: ffff8880338ea4c0
 arch_stack_walk+0x94/0x100 arch/x86/kernel/stacktrace.c:25
 stack_trace_save+0x8e/0xc0 kernel/stacktrace.c:122
 kasan_save_stack+0x33/0x60 mm/kasan/common.c:57
 kasan_save_track+0x14/0x30 mm/kasan/common.c:78
 kasan_save_free_info+0x3b/0x60 mm/kasan/generic.c:584
 poison_slab_object mm/kasan/common.c:253 [inline]
 __kasan_slab_free+0x5f/0x80 mm/kasan/common.c:285
 kasan_slab_free include/linux/kasan.h:235 [inline]
 slab_free_hook mm/slub.c:2540 [inline]
 slab_free mm/slub.c:6670 [inline]
 kmem_cache_free+0x2d8/0x770 mm/slub.c:6781
 anon_vma_chain_free mm/rmap.c:146 [inline]
 unlink_anon_vmas+0x173/0x820 mm/rmap.c:420
 free_pgtables+0x212/0xc10 mm/memory.c:414
 exit_mmap+0x3f1/0xb60 mm/mmap.c:1288
 __mmput+0x12a/0x410 kernel/fork.c:1173
 mmput+0x62/0x70 kernel/fork.c:1196
 exit_mm kernel/exit.c:581 [inline]
 do_exit+0x7d7/0x2bd0 kernel/exit.c:959
 do_group_exit+0xd3/0x2a0 kernel/exit.c:1112
 __do_sys_exit_group kernel/exit.c:1123 [inline]
 __se_sys_exit_group kernel/exit.c:1121 [inline]
 __x64_sys_exit_group+0x3e/0x50 kernel/exit.c:1121
 x64_sys_call+0x151c/0x1740 arch/x86/include/generated/asm/syscalls_64.h:232
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xcd/0xf80 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f3b0dc696c5
RSP: 002b:00007ffe72640248 EFLAGS: 00000206 ORIG_RAX: 00000000000000e7
RAX: ffffffffffffffda RBX: 00007f3b0dd6afe8 RCX: 00007f3b0dc696c5
RDX: 00000000000000e7 RSI: ffffffffffffff88 RDI: 0000000000000000
RBP: 0000000000000001 R08: 00007ffe726401d8 R09: 0000000000000000
R10: 00007ffe72640070 R11: 0000000000000206 R12: 0000000000000000
R13: 0000000000000000 R14: 00007f3b0dd69680 R15: 00007f3b0dd6b000
 </TASK>
rcu: rcu_preempt kthread starved for 10573 jiffies! g15885 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=1
rcu: 	Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
rcu: RCU grace-period kthread stack dump:
task:rcu_preempt     state:R  running task     stack:27720 pid:16    tgid:16    ppid:2      task_flags:0x208040 flags:0x00080000
Call Trace:
 <TASK>
 context_switch kernel/sched/core.c:5256 [inline]
 __schedule+0x1139/0x6150 kernel/sched/core.c:6863
 __schedule_loop kernel/sched/core.c:6945 [inline]
 schedule+0xe7/0x3a0 kernel/sched/core.c:6960
 schedule_timeout+0x123/0x290 kernel/time/sleep_timeout.c:99
 rcu_gp_fqs_loop+0x1ea/0xaf0 kernel/rcu/tree.c:2083
 rcu_gp_kthread+0x26d/0x380 kernel/rcu/tree.c:2285
 kthread+0x3c5/0x780 kernel/kthread.c:463
 ret_from_fork+0x983/0xb10 arch/x86/kernel/process.c:158
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:246
 </TASK>
rcu: Stack dump where RCU GP kthread last ran:
CPU: 1 UID: 0 PID: 0 Comm: swapper/1 Not tainted syzkaller #0 PREEMPT(full) 
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/25/2025
RIP: 0010:pv_native_safe_halt+0xf/0x20 arch/x86/kernel/paravirt.c:82
Code: a6 5f 02 c3 cc cc cc cc 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 66 90 0f 00 2d 13 19 12 00 fb f4 <e9> cc 35 03 00 66 2e 0f 1f 84 00 00 00 00 00 66 90 90 90 90 90 90
RSP: 0000:ffffc90000197de8 EFLAGS: 000002c6
RAX: 00000000000e7bab RBX: 0000000000000001 RCX: ffffffff8b7846d9
RDX: 0000000000000000 RSI: ffffffff8daceab2 RDI: ffffffff8bf2b400
RBP: ffffed1003b56498 R08: 0000000000000001 R09: ffffed10170a673d
R10: ffff8880b85339eb R11: ffff88801dab2ff0 R12: 0000000000000001
R13: ffff88801dab24c0 R14: ffffffff9088bdd0 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff8881249f5000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005555787857d0 CR3: 000000003ccf6000 CR4: 00000000003526f0
Call Trace:
 <TASK>
 arch_safe_halt arch/x86/include/asm/paravirt.h:107 [inline]
 default_idle+0x13/0x20 arch/x86/kernel/process.c:767
 default_idle_call+0x6c/0xb0 kernel/sched/idle.c:122
 cpuidle_idle_call kernel/sched/idle.c:191 [inline]
 do_idle+0x38d/0x510 kernel/sched/idle.c:332
 cpu_startup_entry+0x4f/0x60 kernel/sched/idle.c:430
 start_secondary+0x21d/0x2d0 arch/x86/kernel/smpboot.c:312
 common_startup_64+0x13e/0x148
 </TASK>


Tested on:

commit:         0f61b186 Linux 6.19-rc5
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=102239fc580000
kernel config:  https://syzkaller.appspot.com/x/.config?x=1859476832863c41
dashboard link: https://syzkaller.appspot.com/bug?extid=d8d4c31d40f868eaea30
compiler:       gcc (Debian 12.2.0-14+deb12u1) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
patch:          https://syzkaller.appspot.com/x/patch.diff?x=167a399a580000



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [syzbot] [mm?] INFO: rcu detected stall in purge_vmap_node
  2026-01-12 13:13   ` [syzbot] [mm?] INFO: rcu detected stall in purge_vmap_node syzbot
@ 2026-01-12 13:38     ` Hillf Danton
  2026-01-12 14:50       ` Deepanshu Kartikey
  0 siblings, 1 reply; 7+ messages in thread
From: Hillf Danton @ 2026-01-12 13:38 UTC (permalink / raw)
  To: kartikey406; +Cc: syzbot, akpm, linux-kernel, linux-mm, syzkaller-bugs, urezki

> Date: Mon, 12 Jan 2026 05:13:02 -0800
> Hello,
> 
> syzbot has tested the proposed patch but the reproducer is still triggering an issue:
> INFO: rcu detected stall in x64_sys_call
> 
Given the test result of your patch, can you specify the root cause of the
stall reported, Deepanshu?


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [syzbot] [mm?] INFO: rcu detected stall in purge_vmap_node
  2026-01-12 13:38     ` Hillf Danton
@ 2026-01-12 14:50       ` Deepanshu Kartikey
  0 siblings, 0 replies; 7+ messages in thread
From: Deepanshu Kartikey @ 2026-01-12 14:50 UTC (permalink / raw)
  To: Hillf Danton; +Cc: syzbot, akpm, linux-kernel, linux-mm, syzkaller-bugs, urezki

On Mon, Jan 13, 2026 at 07:08:XX, Hillf Danton wrote:
> Given the test result of your patch, can you specify the root cause
> of the stall reported, Deepanshu?

Hi Hillf,

Thank you for the question. Looking at the stall in the test log, this is
occurring in a different code path from what my patch addresses:

My patch fixes:
  kasan_release_vmalloc_node+0x1ba/0xad0 mm/vmalloc.c:2299
  purge_vmap_node+0x1ba/0xad0

New stall location:
  __reset_page_owner+0x84/0x1a0
  __free_frozen_pages+0x7df/0x1170
  vfree+0x1fd/0xb50
  cleanup_vm_area_work+0x4c/0x100

The root cause pattern is similar (CONFIG_PAGE_OWNER stack unwinding under
RCU read lock), but manifesting in the page freeing path rather than the
KASAN shadow cleanup path.

My patch specifically addresses the unbounded loop in
kasan_release_vmalloc_node()
where we iterate through large purge_lists. The new stall appears to be in
__reset_page_owner() during page freeing, which would need a separate fix in
that code path.

Should I focus on submitting v2 of my vmalloc fix, or would you prefer I
investigate the page_owner stall as well?

Best regards,
Deepanshu

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [syzbot] [mm?] INFO: rcu detected stall in purge_vmap_node
@ 2026-01-12  5:33 syzbot
  0 siblings, 0 replies; 7+ messages in thread
From: syzbot @ 2026-01-12  5:33 UTC (permalink / raw)
  To: akpm, hannes, jackmanb, linux-kernel, linux-mm, mhocko, surenb,
	syzkaller-bugs, vbabka, ziy

Hello,

syzbot found the following issue on:

HEAD commit:    f0b9d8eb98df Merge tag 'nfsd-6.19-3' of git://git.kernel.o..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=135c11fc580000
kernel config:  https://syzkaller.appspot.com/x/.config?x=a11e0f726bfb6765
dashboard link: https://syzkaller.appspot.com/bug?extid=d8d4c31d40f868eaea30
compiler:       gcc (Debian 12.2.0-14+deb12u1) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=12e0c19a580000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=134d1f92580000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/4dbba2a806a3/disk-f0b9d8eb.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/2a52c0f94de7/vmlinux-f0b9d8eb.xz
kernel image: https://storage.googleapis.com/syzbot-assets/5ddf9a24988b/bzImage-f0b9d8eb.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+d8d4c31d40f868eaea30@syzkaller.appspotmail.com

rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
rcu: 	Tasks blocked on level-0 rcu_node (CPUs 0-1): P6229/1:b..l
rcu: 	(detected by 1, t=10502 jiffies, g=10385, q=373 ncpus=2)
task:kworker/0:17    state:R  running task     stack:28840 pid:6229  tgid:6229  ppid:2      task_flags:0x4208060 flags:0x00080000
Workqueue: events purge_vmap_node
Call Trace:
 <TASK>
 context_switch kernel/sched/core.c:5256 [inline]
 __schedule+0x1139/0x6150 kernel/sched/core.c:6863
 preempt_schedule_irq+0x51/0x90 kernel/sched/core.c:7190
 irqentry_exit+0x1d8/0x8c0 kernel/entry/common.c:216
 asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:697
RIP: 0010:lock_acquire+0x62/0x330 kernel/locking/lockdep.c:5872
Code: b4 18 12 83 f8 07 0f 87 a2 02 00 00 89 c0 48 0f a3 05 22 bd ee 0e 0f 82 74 02 00 00 8b 35 ba ed ee 0e 85 f6 0f 85 8d 00 00 00 <48> 8b 44 24 30 65 48 2b 05 39 b4 18 12 0f 85 ad 02 00 00 48 83 c4
RSP: 0018:ffffc900035e7540 EFLAGS: 00000206
RAX: 0000000000000046 RBX: ffffffff8e3c96a0 RCX: 0000000019a1310f
RDX: 0000000000000000 RSI: ffffffff8daa7f9d RDI: ffffffff8bf2b380
RBP: 0000000000000002 R08: 000000007c8d0f89 R09: 0000000097c8d0f8
R10: 0000000000000002 R11: ffff888031598b30 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
 rcu_lock_acquire include/linux/rcupdate.h:331 [inline]
 rcu_read_lock include/linux/rcupdate.h:867 [inline]
 class_rcu_constructor include/linux/rcupdate.h:1195 [inline]
 unwind_next_frame+0xd1/0x20b0 arch/x86/kernel/unwind_orc.c:495
 __unwind_start+0x45f/0x7f0 arch/x86/kernel/unwind_orc.c:773
 unwind_start arch/x86/include/asm/unwind.h:64 [inline]
 arch_stack_walk+0x73/0x100 arch/x86/kernel/stacktrace.c:24
 stack_trace_save+0x8e/0xc0 kernel/stacktrace.c:122
 save_stack+0x160/0x1f0 mm/page_owner.c:165
 __reset_page_owner+0x84/0x1a0 mm/page_owner.c:320
 reset_page_owner include/linux/page_owner.h:25 [inline]
 free_pages_prepare mm/page_alloc.c:1406 [inline]
 __free_frozen_pages+0x7df/0x1170 mm/page_alloc.c:2943
 kasan_depopulate_vmalloc_pte+0x5b/0x80 mm/kasan/shadow.c:484
 apply_to_pte_range mm/memory.c:3182 [inline]
 apply_to_pmd_range mm/memory.c:3226 [inline]
 apply_to_pud_range mm/memory.c:3262 [inline]
 apply_to_p4d_range mm/memory.c:3298 [inline]
 __apply_to_page_range+0xac1/0x13f0 mm/memory.c:3334
 __kasan_release_vmalloc+0xd1/0xe0 mm/kasan/shadow.c:602
 kasan_release_vmalloc include/linux/kasan.h:593 [inline]
 kasan_release_vmalloc_node mm/vmalloc.c:2282 [inline]
 purge_vmap_node+0x1ba/0xad0 mm/vmalloc.c:2299
 process_one_work+0x9ba/0x1b20 kernel/workqueue.c:3257
 process_scheduled_works kernel/workqueue.c:3340 [inline]
 worker_thread+0x6c8/0xf10 kernel/workqueue.c:3421
 kthread+0x3c5/0x780 kernel/kthread.c:463
 ret_from_fork+0x983/0xb10 arch/x86/kernel/process.c:158
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:246
 </TASK>
rcu: rcu_preempt kthread starved for 10534 jiffies! g10385 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=1
rcu: 	Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
rcu: RCU grace-period kthread stack dump:
task:rcu_preempt     state:R  running task     stack:28440 pid:16    tgid:16    ppid:2      task_flags:0x208040 flags:0x00080000
Call Trace:
 <TASK>
 context_switch kernel/sched/core.c:5256 [inline]
 __schedule+0x1139/0x6150 kernel/sched/core.c:6863
 __schedule_loop kernel/sched/core.c:6945 [inline]
 schedule+0xe7/0x3a0 kernel/sched/core.c:6960
 schedule_timeout+0x123/0x290 kernel/time/sleep_timeout.c:99
 rcu_gp_fqs_loop+0x1ea/0xaf0 kernel/rcu/tree.c:2083
 rcu_gp_kthread+0x26d/0x380 kernel/rcu/tree.c:2285
 kthread+0x3c5/0x780 kernel/kthread.c:463
 ret_from_fork+0x983/0xb10 arch/x86/kernel/process.c:158
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:246
 </TASK>
rcu: Stack dump where RCU GP kthread last ran:
CPU: 1 UID: 0 PID: 0 Comm: swapper/1 Not tainted syzkaller #0 PREEMPT(full) 
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/25/2025
RIP: 0010:pv_native_safe_halt+0xf/0x20 arch/x86/kernel/paravirt.c:82
Code: b6 5f 02 c3 cc cc cc cc 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 66 90 0f 00 2d 13 39 12 00 fb f4 <e9> cc 35 03 00 66 2e 0f 1f 84 00 00 00 00 00 66 90 90 90 90 90 90
RSP: 0000:ffffc90000197de8 EFLAGS: 000002c6
RAX: 00000000000f349b RBX: 0000000000000001 RCX: ffffffff8b7826d9
RDX: 0000000000000000 RSI: ffffffff8dace031 RDI: ffffffff8bf2b380
RBP: ffffed1003b58498 R08: 0000000000000001 R09: ffffed10170a673d
R10: ffff8880b85339eb R11: ffff88801dac2ff0 R12: 0000000000000001
R13: ffff88801dac24c0 R14: ffffffff9088b8d0 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff8881249f5000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055555f9257e0 CR3: 000000004b1c0000 CR4: 00000000003526f0
Call Trace:
 <TASK>
 arch_safe_halt arch/x86/include/asm/paravirt.h:107 [inline]
 default_idle+0x13/0x20 arch/x86/kernel/process.c:767
 default_idle_call+0x6c/0xb0 kernel/sched/idle.c:122
 cpuidle_idle_call kernel/sched/idle.c:191 [inline]
 do_idle+0x38d/0x510 kernel/sched/idle.c:332
 cpu_startup_entry+0x4f/0x60 kernel/sched/idle.c:430
 start_secondary+0x21d/0x2d0 arch/x86/kernel/smpboot.c:312
 common_startup_64+0x13e/0x148
 </TASK>


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-01-12 14:51 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-01-12 10:36 [PATCH v2] mm/vmalloc: prevent RCU stalls in kasan_release_vmalloc_node Deepanshu Kartikey
2026-01-12 11:09 ` Uladzislau Rezki
2026-01-12 12:08 ` Hillf Danton
2026-01-12 13:13   ` [syzbot] [mm?] INFO: rcu detected stall in purge_vmap_node syzbot
2026-01-12 13:38     ` Hillf Danton
2026-01-12 14:50       ` Deepanshu Kartikey
  -- strict thread matches above, loose matches on Subject: below --
2026-01-12  5:33 syzbot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox