Re: [syzbot] [mm?] WARNING: bad unlock balance in copy

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Re: [syzbot] [mm?] WARNING: bad unlock balance in copy_process
       [not found] <683adb33.a70a0220.1a6ae.000b.GAE@google.com>
@ 2025-09-17 20:40 ` syzbot
  2025-09-18  8:35   ` Vlastimil Babka
  0 siblings, 1 reply; 5+ messages in thread
From: syzbot @ 2025-09-17 20:40 UTC (permalink / raw)
  To: Liam.Howlett, akpm, bsegall, david, dietmar.eggemann, juri.lelli,
	kees, linux-kernel, linux-mm, lorenzo.stoakes, mgorman, mhocko,
	mingo, peterz, rostedt, rppt, surenb, syzkaller-bugs, vbabka,
	vincent.guittot, vschneid

syzbot has found a reproducer for the following issue on:

HEAD commit:    6edf2885ebeb Merge branch 'for-next/core' into for-kernelci
git tree:       git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-kernelci
console output: https://syzkaller.appspot.com/x/log.txt?x=16d14c7c580000
kernel config:  https://syzkaller.appspot.com/x/.config?x=b8b6789b42526d72
dashboard link: https://syzkaller.appspot.com/bug?extid=80cb3cc5c14fad191a10
compiler:       Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8
userspace arch: arm64
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=179d9f62580000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=11d14c7c580000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/c72239eb6d76/disk-6edf2885.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/b67e9820b2be/vmlinux-6edf2885.xz
kernel image: https://storage.googleapis.com/syzbot-assets/0c4ab7e562f6/Image-6edf2885.gz.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+80cb3cc5c14fad191a10@syzkaller.appspotmail.com

=====================================
WARNING: bad unlock balance detected!
syzkaller #0 Not tainted
-------------------------------------
syz.1.48/6865 is trying to release lock (&sighand->siglock) at:
[<ffff8000803b8634>] spin_unlock include/linux/spinlock.h:391 [inline]
[<ffff8000803b8634>] copy_process+0x22d4/0x31ec kernel/fork.c:2432
but there are no more locks to release!

other info that might help us debug this:
1 lock held by syz.1.48/6865:
 #0: ffff80008fa00450 (cgroup_threadgroup_rwsem){++++}-{0:0}, at: copy_process+0x2228/0x31ec kernel/fork.c:2274

stack backtrace:
CPU: 0 UID: 0 PID: 6865 Comm: syz.1.48 Not tainted syzkaller #0 PREEMPT 
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/30/2025
Call trace:
 show_stack+0x2c/0x3c arch/arm64/kernel/stacktrace.c:499 (C)
 __dump_stack+0x30/0x40 lib/dump_stack.c:94
 dump_stack_lvl+0xd8/0x12c lib/dump_stack.c:120
 dump_stack+0x1c/0x28 lib/dump_stack.c:129
 print_unlock_imbalance_bug+0xf4/0xfc kernel/locking/lockdep.c:5298
 __lock_release kernel/locking/lockdep.c:-1 [inline]
 lock_release+0x244/0x39c kernel/locking/lockdep.c:5889
 __raw_spin_unlock include/linux/spinlock_api_smp.h:141 [inline]
 _raw_spin_unlock+0x24/0x78 kernel/locking/spinlock.c:186
 spin_unlock include/linux/spinlock.h:391 [inline]
 copy_process+0x22d4/0x31ec kernel/fork.c:2432
 kernel_clone+0x1d8/0x84c kernel/fork.c:2605
 __do_sys_clone kernel/fork.c:2748 [inline]
 __se_sys_clone kernel/fork.c:2716 [inline]
 __arm64_sys_clone+0x144/0x1a0 kernel/fork.c:2716
 __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
 invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:49
 el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:132
 do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:151
 el0_svc+0x5c/0x254 arch/arm64/kernel/entry-common.c:744
 el0t_64_sync_handler+0x84/0x12c arch/arm64/kernel/entry-common.c:763
 el0t_64_sync+0x198/0x19c arch/arm64/kernel/entry.S:596


---
If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [syzbot] [mm?] WARNING: bad unlock balance in copy_process
  2025-09-17 20:40 ` [syzbot] [mm?] WARNING: bad unlock balance in copy_process syzbot
@ 2025-09-18  8:35   ` Vlastimil Babka
  2025-09-18  8:48     ` Sebastian Andrzej Siewior
  2025-09-18 13:09     ` [PATCH] futex: Use correct exit on failure from futex_hash_allocate_default() Sebastian Andrzej Siewior
  0 siblings, 2 replies; 5+ messages in thread
From: Vlastimil Babka @ 2025-09-18  8:35 UTC (permalink / raw)
  To: syzbot, Liam.Howlett, akpm, bsegall, david, dietmar.eggemann,
	juri.lelli, kees, linux-kernel, linux-mm, lorenzo.stoakes,
	mgorman, mhocko, mingo, peterz, rostedt, rppt, surenb,
	syzkaller-bugs, vincent.guittot, vschneid,
	Sebastian Andrzej Siewior

On 9/17/25 22:40, syzbot wrote:
> syzbot has found a reproducer for the following issue on:
> 
> HEAD commit:    6edf2885ebeb Merge branch 'for-next/core' into for-kernelci
> git tree:       git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-kernelci
> console output: https://syzkaller.appspot.com/x/log.txt?x=16d14c7c580000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=b8b6789b42526d72
> dashboard link: https://syzkaller.appspot.com/bug?extid=80cb3cc5c14fad191a10
> compiler:       Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8
> userspace arch: arm64
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=179d9f62580000
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=11d14c7c580000
> 
> Downloadable assets:
> disk image: https://storage.googleapis.com/syzbot-assets/c72239eb6d76/disk-6edf2885.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/b67e9820b2be/vmlinux-6edf2885.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/0c4ab7e562f6/Image-6edf2885.gz.xz
> 
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+80cb3cc5c14fad191a10@syzkaller.appspotmail.com
> 
> =====================================
> WARNING: bad unlock balance detected!
> syzkaller #0 Not tainted
> -------------------------------------
> syz.1.48/6865 is trying to release lock (&sighand->siglock) at:
> [<ffff8000803b8634>] spin_unlock include/linux/spinlock.h:391 [inline]
> [<ffff8000803b8634>] copy_process+0x22d4/0x31ec kernel/fork.c:2432

bad_fork_core_free:
        sched_core_free(p);
        spin_unlock(&current->sighand->siglock); <- here

Sebastian, I think it's your 7c4f75a21f63 ("futex: Allow automatic
allocation of process wide futex hash") adding a "goto bad_fork_core_free;"
from a place that doesn't yet have current->sighand->siglock locked?

> but there are no more locks to release!
> 
> other info that might help us debug this:
> 1 lock held by syz.1.48/6865:
>  #0: ffff80008fa00450 (cgroup_threadgroup_rwsem){++++}-{0:0}, at: copy_process+0x2228/0x31ec kernel/fork.c:2274
> 
> stack backtrace:
> CPU: 0 UID: 0 PID: 6865 Comm: syz.1.48 Not tainted syzkaller #0 PREEMPT 
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/30/2025
> Call trace:
>  show_stack+0x2c/0x3c arch/arm64/kernel/stacktrace.c:499 (C)
>  __dump_stack+0x30/0x40 lib/dump_stack.c:94
>  dump_stack_lvl+0xd8/0x12c lib/dump_stack.c:120
>  dump_stack+0x1c/0x28 lib/dump_stack.c:129
>  print_unlock_imbalance_bug+0xf4/0xfc kernel/locking/lockdep.c:5298
>  __lock_release kernel/locking/lockdep.c:-1 [inline]
>  lock_release+0x244/0x39c kernel/locking/lockdep.c:5889
>  __raw_spin_unlock include/linux/spinlock_api_smp.h:141 [inline]
>  _raw_spin_unlock+0x24/0x78 kernel/locking/spinlock.c:186
>  spin_unlock include/linux/spinlock.h:391 [inline]
>  copy_process+0x22d4/0x31ec kernel/fork.c:2432
>  kernel_clone+0x1d8/0x84c kernel/fork.c:2605
>  __do_sys_clone kernel/fork.c:2748 [inline]
>  __se_sys_clone kernel/fork.c:2716 [inline]
>  __arm64_sys_clone+0x144/0x1a0 kernel/fork.c:2716
>  __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
>  invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:49
>  el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:132
>  do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:151
>  el0_svc+0x5c/0x254 arch/arm64/kernel/entry-common.c:744
>  el0t_64_sync_handler+0x84/0x12c arch/arm64/kernel/entry-common.c:763
>  el0t_64_sync+0x198/0x19c arch/arm64/kernel/entry.S:596
> 
> 
> ---
> If you want syzbot to run the reproducer, reply with:
> #syz test: git://repo/address.git branch-or-commit-hash
> If you attach or paste a git patch, syzbot will apply it before testing.



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [syzbot] [mm?] WARNING: bad unlock balance in copy_process
  2025-09-18  8:35   ` Vlastimil Babka
@ 2025-09-18  8:48     ` Sebastian Andrzej Siewior
  2025-09-18 13:09     ` [PATCH] futex: Use correct exit on failure from futex_hash_allocate_default() Sebastian Andrzej Siewior
  1 sibling, 0 replies; 5+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-09-18  8:48 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: syzbot, Liam.Howlett, akpm, bsegall, david, dietmar.eggemann,
	juri.lelli, kees, linux-kernel, linux-mm, lorenzo.stoakes,
	mgorman, mhocko, mingo, peterz, rostedt, rppt, surenb,
	syzkaller-bugs, vincent.guittot, vschneid

On 2025-09-18 10:35:24 [+0200], Vlastimil Babka wrote:
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: syzbot+80cb3cc5c14fad191a10@syzkaller.appspotmail.com
> > 
> > =====================================
> > WARNING: bad unlock balance detected!
> > syzkaller #0 Not tainted
> > -------------------------------------
> > syz.1.48/6865 is trying to release lock (&sighand->siglock) at:
> > [<ffff8000803b8634>] spin_unlock include/linux/spinlock.h:391 [inline]
> > [<ffff8000803b8634>] copy_process+0x22d4/0x31ec kernel/fork.c:2432
> 
> bad_fork_core_free:
>         sched_core_free(p);
>         spin_unlock(&current->sighand->siglock); <- here
> 
> Sebastian, I think it's your 7c4f75a21f63 ("futex: Allow automatic
> allocation of process wide futex hash") adding a "goto bad_fork_core_free;"
> from a place that doesn't yet have current->sighand->siglock locked?

Yes. Judging from -rc6, if futex_hash_allocate_default() fails we hold
neither siglock nor tasklist_lock. sched_core_free() looks also bad as
the cookie was allocated later in sched_core_fork(). sched_cgroup_fork()
does nothing special. So it should be

diff --git a/kernel/fork.c b/kernel/fork.c
index c4ada32598bd5..6ca8689a83b5b 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2295,7 +2295,7 @@ __latent_entropy struct task_struct *copy_process(
 	if (need_futex_hash_allocate_default(clone_flags)) {
 		retval = futex_hash_allocate_default();
 		if (retval)
-			goto bad_fork_core_free;
+			goto bad_fork_cancel_cgroup;
 		/*
 		 * If we fail beyond this point we don't free the allocated
 		 * futex hash map. We assume that another thread will be created

Sebastian


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH] futex: Use correct exit on failure from futex_hash_allocate_default()
  2025-09-18  8:35   ` Vlastimil Babka
  2025-09-18  8:48     ` Sebastian Andrzej Siewior
@ 2025-09-18 13:09     ` Sebastian Andrzej Siewior
  2025-09-18 15:30       ` Steven Rostedt
  1 sibling, 1 reply; 5+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-09-18 13:09 UTC (permalink / raw)
  To: Vlastimil Babka, Thomas Gleixner, Peter Zijlstra
  Cc: syzbot, Liam.Howlett, akpm, bsegall, david, dietmar.eggemann,
	juri.lelli, kees, linux-kernel, linux-mm, lorenzo.stoakes,
	mgorman, mhocko, mingo, peterz, rostedt, rppt, surenb,
	syzkaller-bugs, vincent.guittot, vschneid

copy_process() uses the wrong error exit path from
futex_hash_allocate_default().
After exiting from futex_hash_allocate_default(), neither tasklist_lock
nor siglock has been acquired. The exit label bad_fork_core_free unlocks
both of these locks which is wrong.

The previous label, bad_fork_cancel_cgroup, is the correct exit.
sched_cgroup_fork() did not allocate any resources that need to freed.

Use bad_fork_cancel_cgroup on error exit from
futex_hash_allocate_default().

Fixes: 7c4f75a21f636 ("futex: Allow automatic allocation of process wide futex hash")
Reported-by: syzbot+80cb3cc5c14fad191a10@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/68cb1cbd.050a0220.2ff435.0599.GAE@google.com
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---

That private-futex code was marked BROKEN in v6.16 and re-enabled in
v6.17. It could use
  56180dd20c19e ("futex: Use RCU-based per-CPU reference counting instead of rcuref_t")

as Fixes: instead to avoid backporting to v6.16.

 kernel/fork.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index c4ada32598bd5..6ca8689a83b5b 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2295,7 +2295,7 @@ __latent_entropy struct task_struct *copy_process(
 	if (need_futex_hash_allocate_default(clone_flags)) {
 		retval = futex_hash_allocate_default();
 		if (retval)
-			goto bad_fork_core_free;
+			goto bad_fork_cancel_cgroup;
 		/*
 		 * If we fail beyond this point we don't free the allocated
 		 * futex hash map. We assume that another thread will be created
-- 
2.51.0



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] futex: Use correct exit on failure from futex_hash_allocate_default()
  2025-09-18 13:09     ` [PATCH] futex: Use correct exit on failure from futex_hash_allocate_default() Sebastian Andrzej Siewior
@ 2025-09-18 15:30       ` Steven Rostedt
  0 siblings, 0 replies; 5+ messages in thread
From: Steven Rostedt @ 2025-09-18 15:30 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Vlastimil Babka, Thomas Gleixner, Peter Zijlstra, syzbot,
	Liam.Howlett, akpm, bsegall, david, dietmar.eggemann, juri.lelli,
	kees, linux-kernel, linux-mm, lorenzo.stoakes, mgorman, mhocko,
	mingo, rppt, surenb, syzkaller-bugs, vincent.guittot, vschneid

On Thu, 18 Sep 2025 15:09:45 +0200
Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:

> copy_process() uses the wrong error exit path from
> futex_hash_allocate_default().
> After exiting from futex_hash_allocate_default(), neither tasklist_lock
> nor siglock has been acquired. The exit label bad_fork_core_free unlocks
> both of these locks which is wrong.
> 
> The previous label, bad_fork_cancel_cgroup, is the correct exit.
> sched_cgroup_fork() did not allocate any resources that need to freed.
> 
> Use bad_fork_cancel_cgroup on error exit from
> futex_hash_allocate_default().

	if (need_futex_hash_allocate_default(clone_flags)) {
		retval = futex_hash_allocate_default();
		if (retval)
			goto bad_fork_core_free;
		[..]
	}
	[..]
	write_lock_irq(&tasklist_lock);
	[..]
	klp_copy_process(p);

	sched_core_fork(p);

	spin_lock(&current->sighand->siglock);

	[..]

 bad_fork_core_free:
	sched_core_free(p);
	spin_unlock(&current->sighand->siglock);
	write_unlock_irq(&tasklist_lock);
 bad_fork_cancel_cgroup:
	cgroup_cancel_fork(p, args);

Yep, looks bad to me!

Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>

-- Steve


> 
> Fixes: 7c4f75a21f636 ("futex: Allow automatic allocation of process wide futex hash")
> Reported-by: syzbot+80cb3cc5c14fad191a10@syzkaller.appspotmail.com
> Closes: https://lore.kernel.org/all/68cb1cbd.050a0220.2ff435.0599.GAE@google.com
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
>


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-09-18 15:29 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <683adb33.a70a0220.1a6ae.000b.GAE@google.com>
2025-09-17 20:40 ` [syzbot] [mm?] WARNING: bad unlock balance in copy_process syzbot
2025-09-18  8:35   ` Vlastimil Babka
2025-09-18  8:48     ` Sebastian Andrzej Siewior
2025-09-18 13:09     ` [PATCH] futex: Use correct exit on failure from futex_hash_allocate_default() Sebastian Andrzej Siewior
2025-09-18 15:30       ` Steven Rostedt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox