Re: [PATCH] bcachefs: Use alloc_percpu

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Re: [PATCH] bcachefs: Use alloc_percpu_gfp to avoid deadlock
       [not found] <20250212100625.55860-1-mmpgouride@gmail.com>
@ 2025-02-12 14:27 ` Kent Overstreet
  2025-02-20 10:57   ` Alan Huang
  0 siblings, 1 reply; 9+ messages in thread
From: Kent Overstreet @ 2025-02-12 14:27 UTC (permalink / raw)
  To: Alan Huang
  Cc: linux-bcachefs, syzbot+fe63f377148a6371a9db, linux-mm, Tejun Heo,
	Dennis Zhou, Christoph Lameter

Adding pcpu people to the CC

On Wed, Feb 12, 2025 at 06:06:25PM +0800, Alan Huang wrote:
> The cycle:
> 
> CPU0:			CPU1:
> bc->lock		pcpu_alloc_mutex
> pcpu_alloc_mutex	bc->lock
> 
> Reported-by: syzbot+fe63f377148a6371a9db@syzkaller.appspotmail.com
> Tested-by: syzbot+fe63f377148a6371a9db@syzkaller.appspotmail.com
> Signed-off-by: Alan Huang <mmpgouride@gmail.com>

So pcpu_alloc_mutex -> fs_reclaim?

That's really awkward; seems like something that might invite more
issues. We can apply your fix if we need to, but I want to hear with the
percpu people have to say first.

======================================================
WARNING: possible circular locking dependency detected
6.14.0-rc2-syzkaller-00039-g09fbf3d50205 #0 Not tainted
------------------------------------------------------
syz.0.21/5625 is trying to acquire lock:
ffffffff8ea19608 (pcpu_alloc_mutex){+.+.}-{4:4}, at: pcpu_alloc_noprof+0x293/0x1760 mm/percpu.c:1782

but task is already holding lock:
ffff888051401c68 (&bc->lock){+.+.}-{4:4}, at: bch2_btree_node_mem_alloc+0x559/0x16f0 fs/bcachefs/btree_cache.c:804

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #2 (&bc->lock){+.+.}-{4:4}:
       lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5851
       __mutex_lock_common kernel/locking/mutex.c:585 [inline]
       __mutex_lock+0x19c/0x1010 kernel/locking/mutex.c:730
       bch2_btree_cache_scan+0x184/0xec0 fs/bcachefs/btree_cache.c:482
       do_shrink_slab+0x72d/0x1160 mm/shrinker.c:437
       shrink_slab+0x1093/0x14d0 mm/shrinker.c:664
       shrink_one+0x43b/0x850 mm/vmscan.c:4868
       shrink_many mm/vmscan.c:4929 [inline]
       lru_gen_shrink_node mm/vmscan.c:5007 [inline]
       shrink_node+0x37c5/0x3e50 mm/vmscan.c:5978
       kswapd_shrink_node mm/vmscan.c:6807 [inline]
       balance_pgdat mm/vmscan.c:6999 [inline]
       kswapd+0x20f3/0x3b10 mm/vmscan.c:7264
       kthread+0x7a9/0x920 kernel/kthread.c:464
       ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:148
       ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244

-> #1 (fs_reclaim){+.+.}-{0:0}:
       lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5851
       __fs_reclaim_acquire mm/page_alloc.c:3853 [inline]
       fs_reclaim_acquire+0x88/0x130 mm/page_alloc.c:3867
       might_alloc include/linux/sched/mm.h:318 [inline]
       slab_pre_alloc_hook mm/slub.c:4066 [inline]
       slab_alloc_node mm/slub.c:4144 [inline]
       __do_kmalloc_node mm/slub.c:4293 [inline]
       __kmalloc_noprof+0xae/0x4c0 mm/slub.c:4306
       kmalloc_noprof include/linux/slab.h:905 [inline]
       kzalloc_noprof include/linux/slab.h:1037 [inline]
       pcpu_mem_zalloc mm/percpu.c:510 [inline]
       pcpu_alloc_chunk mm/percpu.c:1430 [inline]
       pcpu_create_chunk+0x57/0xbc0 mm/percpu-vm.c:338
       pcpu_balance_populated mm/percpu.c:2063 [inline]
       pcpu_balance_workfn+0xc4d/0xd40 mm/percpu.c:2200
       process_one_work kernel/workqueue.c:3236 [inline]
       process_scheduled_works+0xa66/0x1840 kernel/workqueue.c:3317
       worker_thread+0x870/0xd30 kernel/workqueue.c:3398
       kthread+0x7a9/0x920 kernel/kthread.c:464
       ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:148
       ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244

-> #0 (pcpu_alloc_mutex){+.+.}-{4:4}:
       check_prev_add kernel/locking/lockdep.c:3163 [inline]
       check_prevs_add kernel/locking/lockdep.c:3282 [inline]
       validate_chain+0x18ef/0x5920 kernel/locking/lockdep.c:3906
       __lock_acquire+0x1397/0x2100 kernel/locking/lockdep.c:5228
       lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5851
       __mutex_lock_common kernel/locking/mutex.c:585 [inline]
       __mutex_lock+0x19c/0x1010 kernel/locking/mutex.c:730
       pcpu_alloc_noprof+0x293/0x1760 mm/percpu.c:1782
       __six_lock_init+0x104/0x150 fs/bcachefs/six.c:876
       bch2_btree_lock_init+0x38/0x100 fs/bcachefs/btree_locking.c:12
       bch2_btree_node_mem_alloc+0x565/0x16f0 fs/bcachefs/btree_cache.c:807
       __bch2_btree_node_alloc fs/bcachefs/btree_update_interior.c:304 [inline]
       bch2_btree_reserve_get+0x2df/0x1890 fs/bcachefs/btree_update_interior.c:532
       bch2_btree_update_start+0xe56/0x14e0 fs/bcachefs/btree_update_interior.c:1230
       bch2_btree_split_leaf+0x121/0x880 fs/bcachefs/btree_update_interior.c:1851
       bch2_trans_commit_error+0x212/0x1380 fs/bcachefs/btree_trans_commit.c:908
       __bch2_trans_commit+0x812b/0x97a0 fs/bcachefs/btree_trans_commit.c:1085
       bch2_trans_commit fs/bcachefs/btree_update.h:183 [inline]
       bch2_trans_mark_metadata_bucket+0x47a/0x17b0 fs/bcachefs/buckets.c:1043
       bch2_trans_mark_metadata_sectors fs/bcachefs/buckets.c:1060 [inline]
       __bch2_trans_mark_dev_sb fs/bcachefs/buckets.c:1100 [inline]
       bch2_trans_mark_dev_sb+0x3f6/0x820 fs/bcachefs/buckets.c:1128
       bch2_trans_mark_dev_sbs_flags+0x6be/0x720 fs/bcachefs/buckets.c:1138
       bch2_fs_initialize+0xba0/0x1610 fs/bcachefs/recovery.c:1149
       bch2_fs_start+0x36d/0x610 fs/bcachefs/super.c:1042
       bch2_fs_get_tree+0xd8d/0x1740 fs/bcachefs/fs.c:2203
       vfs_get_tree+0x90/0x2b0 fs/super.c:1814
       do_new_mount+0x2be/0xb40 fs/namespace.c:3560
       do_mount fs/namespace.c:3900 [inline]
       __do_sys_mount fs/namespace.c:4111 [inline]
       __se_sys_mount+0x2d6/0x3c0 fs/namespace.c:4088
       do_syscall_x64 arch/x86/entry/common.c:52 [inline]
       do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x77/0x7f

other info that might help us debug this:

Chain exists of:
  pcpu_alloc_mutex --> fs_reclaim --> &bc->lock

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&bc->lock);
                               lock(fs_reclaim);
                               lock(&bc->lock);
  lock(pcpu_alloc_mutex);

 *** DEADLOCK ***

4 locks held by syz.0.21/5625:
 #0: ffff888051400278 (&c->state_lock){+.+.}-{4:4}, at: bch2_fs_start+0x45/0x610 fs/bcachefs/super.c:1010
 #1: ffff888051404378 (&c->btree_trans_barrier){.+.+}-{0:0}, at: srcu_lock_acquire include/linux/srcu.h:164 [inline]
 #1: ffff888051404378 (&c->btree_trans_barrier){.+.+}-{0:0}, at: srcu_read_lock include/linux/srcu.h:256 [inline]
 #1: ffff888051404378 (&c->btree_trans_barrier){.+.+}-{0:0}, at: __bch2_trans_get+0x7e4/0xd30 fs/bcachefs/btree_iter.c:3377
 #2: ffff8880514266d0 (&c->gc_lock){.+.+}-{4:4}, at: bch2_btree_update_start+0x682/0x14e0 fs/bcachefs/btree_update_interior.c:1180
 #3: ffff888051401c68 (&bc->lock){+.+.}-{4:4}, at: bch2_btree_node_mem_alloc+0x559/0x16f0 fs/bcachefs/btree_cache.c:804

stack backtrace:
CPU: 0 UID: 0 PID: 5625 Comm: syz.0.21 Not tainted 6.14.0-rc2-syzkaller-00039-g09fbf3d50205 #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:94 [inline]
 dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120
 print_circular_bug+0x13a/0x1b0 kernel/locking/lockdep.c:2076
 check_noncircular+0x36a/0x4a0 kernel/locking/lockdep.c:2208
 check_prev_add kernel/locking/lockdep.c:3163 [inline]
 check_prevs_add kernel/locking/lockdep.c:3282 [inline]
 validate_chain+0x18ef/0x5920 kernel/locking/lockdep.c:3906
 __lock_acquire+0x1397/0x2100 kernel/locking/lockdep.c:5228
 lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5851
 __mutex_lock_common kernel/locking/mutex.c:585 [inline]
 __mutex_lock+0x19c/0x1010 kernel/locking/mutex.c:730
 pcpu_alloc_noprof+0x293/0x1760 mm/percpu.c:1782
 __six_lock_init+0x104/0x150 fs/bcachefs/six.c:876
 bch2_btree_lock_init+0x38/0x100 fs/bcachefs/btree_locking.c:12
 bch2_btree_node_mem_alloc+0x565/0x16f0 fs/bcachefs/btree_cache.c:807
 __bch2_btree_node_alloc fs/bcachefs/btree_update_interior.c:304 [inline]
 bch2_btree_reserve_get+0x2df/0x1890 fs/bcachefs/btree_update_interior.c:532
 bch2_btree_update_start+0xe56/0x14e0 fs/bcachefs/btree_update_interior.c:1230
 bch2_btree_split_leaf+0x121/0x880 fs/bcachefs/btree_update_interior.c:1851
 bch2_trans_commit_error+0x212/0x1380 fs/bcachefs/btree_trans_commit.c:908
 __bch2_trans_commit+0x812b/0x97a0 fs/bcachefs/btree_trans_commit.c:1085
 bch2_trans_commit fs/bcachefs/btree_update.h:183 [inline]
 bch2_trans_mark_metadata_bucket+0x47a/0x17b0 fs/bcachefs/buckets.c:1043
 bch2_trans_mark_metadata_sectors fs/bcachefs/buckets.c:1060 [inline]
 __bch2_trans_mark_dev_sb fs/bcachefs/buckets.c:1100 [inline]
 bch2_trans_mark_dev_sb+0x3f6/0x820 fs/bcachefs/buckets.c:1128
 bch2_trans_mark_dev_sbs_flags+0x6be/0x720 fs/bcachefs/buckets.c:1138
 bch2_fs_initialize+0xba0/0x1610 fs/bcachefs/recovery.c:1149
 bch2_fs_start+0x36d/0x610 fs/bcachefs/super.c:1042
 bch2_fs_get_tree+0xd8d/0x1740 fs/bcachefs/fs.c:2203
 vfs_get_tree+0x90/0x2b0 fs/super.c:1814
 do_new_mount+0x2be/0xb40 fs/namespace.c:3560
 do_mount fs/namespace.c:3900 [inline]
 __do_sys_mount fs/namespace.c:4111 [inline]
 __se_sys_mount+0x2d6/0x3c0 fs/namespace.c:4088
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fcaed38e58a
Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb a6 e8 de 1a 00 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fcaec5fde68 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
RAX: ffffffffffffffda RBX: 00007fcaec5fdef0 RCX: 00007fcaed38e58a
RDX: 00004000000000c0 RSI: 0000400000000180 RDI: 00007fcaec5fdeb0
RBP: 00004000000000c0 R08: 00007fcaec5fdef0 R09: 0000000000000000

> ---
>  fs/bcachefs/six.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/bcachefs/six.c b/fs/bcachefs/six.c
> index 7e7c66a1e1a6..ccdc6d496910 100644
> --- a/fs/bcachefs/six.c
> +++ b/fs/bcachefs/six.c
> @@ -873,7 +873,7 @@ void __six_lock_init(struct six_lock *lock, const char *name,
>  		 * failure if they wish by checking lock->readers, but generally
>  		 * will not want to treat it as an error.
>  		 */
> -		lock->readers = alloc_percpu(unsigned);
> +		lock->readers = alloc_percpu_gfp(unsigned, GFP_NOWAIT|__GFP_NOWARN);
>  	}
>  #endif
>  }
> -- 
> 2.47.0
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] bcachefs: Use alloc_percpu_gfp to avoid deadlock
  2025-02-12 14:27 ` [PATCH] bcachefs: Use alloc_percpu_gfp to avoid deadlock Kent Overstreet
@ 2025-02-20 10:57   ` Alan Huang
  2025-02-20 12:40     ` Kent Overstreet
  2025-02-20 17:16     ` Vlastimil Babka
  0 siblings, 2 replies; 9+ messages in thread
From: Alan Huang @ 2025-02-20 10:57 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: linux-bcachefs, syzbot+fe63f377148a6371a9db, linux-mm, Tejun Heo,
	Dennis Zhou, Christoph Lameter

Ping

> On Feb 12, 2025, at 22:27, Kent Overstreet <kent.overstreet@linux.dev> wrote:
> 
> Adding pcpu people to the CC
> 
> On Wed, Feb 12, 2025 at 06:06:25PM +0800, Alan Huang wrote:
>> The cycle:
>> 
>> CPU0: CPU1:
>> bc->lock pcpu_alloc_mutex
>> pcpu_alloc_mutex bc->lock
>> 
>> Reported-by: syzbot+fe63f377148a6371a9db@syzkaller.appspotmail.com
>> Tested-by: syzbot+fe63f377148a6371a9db@syzkaller.appspotmail.com
>> Signed-off-by: Alan Huang <mmpgouride@gmail.com>
> 
> So pcpu_alloc_mutex -> fs_reclaim?
> 
> That's really awkward; seems like something that might invite more
> issues. We can apply your fix if we need to, but I want to hear with the
> percpu people have to say first.
> 
> ======================================================
> WARNING: possible circular locking dependency detected
> 6.14.0-rc2-syzkaller-00039-g09fbf3d50205 #0 Not tainted
> ------------------------------------------------------
> syz.0.21/5625 is trying to acquire lock:
> ffffffff8ea19608 (pcpu_alloc_mutex){+.+.}-{4:4}, at: pcpu_alloc_noprof+0x293/0x1760 mm/percpu.c:1782
> 
> but task is already holding lock:
> ffff888051401c68 (&bc->lock){+.+.}-{4:4}, at: bch2_btree_node_mem_alloc+0x559/0x16f0 fs/bcachefs/btree_cache.c:804
> 
> which lock already depends on the new lock.
> 
> 
> the existing dependency chain (in reverse order) is:
> 
> -> #2 (&bc->lock){+.+.}-{4:4}:
>       lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5851
>       __mutex_lock_common kernel/locking/mutex.c:585 [inline]
>       __mutex_lock+0x19c/0x1010 kernel/locking/mutex.c:730
>       bch2_btree_cache_scan+0x184/0xec0 fs/bcachefs/btree_cache.c:482
>       do_shrink_slab+0x72d/0x1160 mm/shrinker.c:437
>       shrink_slab+0x1093/0x14d0 mm/shrinker.c:664
>       shrink_one+0x43b/0x850 mm/vmscan.c:4868
>       shrink_many mm/vmscan.c:4929 [inline]
>       lru_gen_shrink_node mm/vmscan.c:5007 [inline]
>       shrink_node+0x37c5/0x3e50 mm/vmscan.c:5978
>       kswapd_shrink_node mm/vmscan.c:6807 [inline]
>       balance_pgdat mm/vmscan.c:6999 [inline]
>       kswapd+0x20f3/0x3b10 mm/vmscan.c:7264
>       kthread+0x7a9/0x920 kernel/kthread.c:464
>       ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:148
>       ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
> 
> -> #1 (fs_reclaim){+.+.}-{0:0}:
>       lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5851
>       __fs_reclaim_acquire mm/page_alloc.c:3853 [inline]
>       fs_reclaim_acquire+0x88/0x130 mm/page_alloc.c:3867
>       might_alloc include/linux/sched/mm.h:318 [inline]
>       slab_pre_alloc_hook mm/slub.c:4066 [inline]
>       slab_alloc_node mm/slub.c:4144 [inline]
>       __do_kmalloc_node mm/slub.c:4293 [inline]
>       __kmalloc_noprof+0xae/0x4c0 mm/slub.c:4306
>       kmalloc_noprof include/linux/slab.h:905 [inline]
>       kzalloc_noprof include/linux/slab.h:1037 [inline]
>       pcpu_mem_zalloc mm/percpu.c:510 [inline]
>       pcpu_alloc_chunk mm/percpu.c:1430 [inline]
>       pcpu_create_chunk+0x57/0xbc0 mm/percpu-vm.c:338
>       pcpu_balance_populated mm/percpu.c:2063 [inline]
>       pcpu_balance_workfn+0xc4d/0xd40 mm/percpu.c:2200
>       process_one_work kernel/workqueue.c:3236 [inline]
>       process_scheduled_works+0xa66/0x1840 kernel/workqueue.c:3317
>       worker_thread+0x870/0xd30 kernel/workqueue.c:3398
>       kthread+0x7a9/0x920 kernel/kthread.c:464
>       ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:148
>       ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
> 
> -> #0 (pcpu_alloc_mutex){+.+.}-{4:4}:
>       check_prev_add kernel/locking/lockdep.c:3163 [inline]
>       check_prevs_add kernel/locking/lockdep.c:3282 [inline]
>       validate_chain+0x18ef/0x5920 kernel/locking/lockdep.c:3906
>       __lock_acquire+0x1397/0x2100 kernel/locking/lockdep.c:5228
>       lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5851
>       __mutex_lock_common kernel/locking/mutex.c:585 [inline]
>       __mutex_lock+0x19c/0x1010 kernel/locking/mutex.c:730
>       pcpu_alloc_noprof+0x293/0x1760 mm/percpu.c:1782
>       __six_lock_init+0x104/0x150 fs/bcachefs/six.c:876
>       bch2_btree_lock_init+0x38/0x100 fs/bcachefs/btree_locking.c:12
>       bch2_btree_node_mem_alloc+0x565/0x16f0 fs/bcachefs/btree_cache.c:807
>       __bch2_btree_node_alloc fs/bcachefs/btree_update_interior.c:304 [inline]
>       bch2_btree_reserve_get+0x2df/0x1890 fs/bcachefs/btree_update_interior.c:532
>       bch2_btree_update_start+0xe56/0x14e0 fs/bcachefs/btree_update_interior.c:1230
>       bch2_btree_split_leaf+0x121/0x880 fs/bcachefs/btree_update_interior.c:1851
>       bch2_trans_commit_error+0x212/0x1380 fs/bcachefs/btree_trans_commit.c:908
>       __bch2_trans_commit+0x812b/0x97a0 fs/bcachefs/btree_trans_commit.c:1085
>       bch2_trans_commit fs/bcachefs/btree_update.h:183 [inline]
>       bch2_trans_mark_metadata_bucket+0x47a/0x17b0 fs/bcachefs/buckets.c:1043
>       bch2_trans_mark_metadata_sectors fs/bcachefs/buckets.c:1060 [inline]
>       __bch2_trans_mark_dev_sb fs/bcachefs/buckets.c:1100 [inline]
>       bch2_trans_mark_dev_sb+0x3f6/0x820 fs/bcachefs/buckets.c:1128
>       bch2_trans_mark_dev_sbs_flags+0x6be/0x720 fs/bcachefs/buckets.c:1138
>       bch2_fs_initialize+0xba0/0x1610 fs/bcachefs/recovery.c:1149
>       bch2_fs_start+0x36d/0x610 fs/bcachefs/super.c:1042
>       bch2_fs_get_tree+0xd8d/0x1740 fs/bcachefs/fs.c:2203
>       vfs_get_tree+0x90/0x2b0 fs/super.c:1814
>       do_new_mount+0x2be/0xb40 fs/namespace.c:3560
>       do_mount fs/namespace.c:3900 [inline]
>       __do_sys_mount fs/namespace.c:4111 [inline]
>       __se_sys_mount+0x2d6/0x3c0 fs/namespace.c:4088
>       do_syscall_x64 arch/x86/entry/common.c:52 [inline]
>       do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
>       entry_SYSCALL_64_after_hwframe+0x77/0x7f
> 
> other info that might help us debug this:
> 
> Chain exists of:
>  pcpu_alloc_mutex --> fs_reclaim --> &bc->lock
> 
> Possible unsafe locking scenario:
> 
>       CPU0                    CPU1
>       ----                    ----
>  lock(&bc->lock);
>                               lock(fs_reclaim);
>                               lock(&bc->lock);
>  lock(pcpu_alloc_mutex);
> 
> *** DEADLOCK ***
> 
> 4 locks held by syz.0.21/5625:
> #0: ffff888051400278 (&c->state_lock){+.+.}-{4:4}, at: bch2_fs_start+0x45/0x610 fs/bcachefs/super.c:1010
> #1: ffff888051404378 (&c->btree_trans_barrier){.+.+}-{0:0}, at: srcu_lock_acquire include/linux/srcu.h:164 [inline]
> #1: ffff888051404378 (&c->btree_trans_barrier){.+.+}-{0:0}, at: srcu_read_lock include/linux/srcu.h:256 [inline]
> #1: ffff888051404378 (&c->btree_trans_barrier){.+.+}-{0:0}, at: __bch2_trans_get+0x7e4/0xd30 fs/bcachefs/btree_iter.c:3377
> #2: ffff8880514266d0 (&c->gc_lock){.+.+}-{4:4}, at: bch2_btree_update_start+0x682/0x14e0 fs/bcachefs/btree_update_interior.c:1180
> #3: ffff888051401c68 (&bc->lock){+.+.}-{4:4}, at: bch2_btree_node_mem_alloc+0x559/0x16f0 fs/bcachefs/btree_cache.c:804
> 
> stack backtrace:
> CPU: 0 UID: 0 PID: 5625 Comm: syz.0.21 Not tainted 6.14.0-rc2-syzkaller-00039-g09fbf3d50205 #0
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
> Call Trace:
> <TASK>
> __dump_stack lib/dump_stack.c:94 [inline]
> dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120
> print_circular_bug+0x13a/0x1b0 kernel/locking/lockdep.c:2076
> check_noncircular+0x36a/0x4a0 kernel/locking/lockdep.c:2208
> check_prev_add kernel/locking/lockdep.c:3163 [inline]
> check_prevs_add kernel/locking/lockdep.c:3282 [inline]
> validate_chain+0x18ef/0x5920 kernel/locking/lockdep.c:3906
> __lock_acquire+0x1397/0x2100 kernel/locking/lockdep.c:5228
> lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5851
> __mutex_lock_common kernel/locking/mutex.c:585 [inline]
> __mutex_lock+0x19c/0x1010 kernel/locking/mutex.c:730
> pcpu_alloc_noprof+0x293/0x1760 mm/percpu.c:1782
> __six_lock_init+0x104/0x150 fs/bcachefs/six.c:876
> bch2_btree_lock_init+0x38/0x100 fs/bcachefs/btree_locking.c:12
> bch2_btree_node_mem_alloc+0x565/0x16f0 fs/bcachefs/btree_cache.c:807
> __bch2_btree_node_alloc fs/bcachefs/btree_update_interior.c:304 [inline]
> bch2_btree_reserve_get+0x2df/0x1890 fs/bcachefs/btree_update_interior.c:532
> bch2_btree_update_start+0xe56/0x14e0 fs/bcachefs/btree_update_interior.c:1230
> bch2_btree_split_leaf+0x121/0x880 fs/bcachefs/btree_update_interior.c:1851
> bch2_trans_commit_error+0x212/0x1380 fs/bcachefs/btree_trans_commit.c:908
> __bch2_trans_commit+0x812b/0x97a0 fs/bcachefs/btree_trans_commit.c:1085
> bch2_trans_commit fs/bcachefs/btree_update.h:183 [inline]
> bch2_trans_mark_metadata_bucket+0x47a/0x17b0 fs/bcachefs/buckets.c:1043
> bch2_trans_mark_metadata_sectors fs/bcachefs/buckets.c:1060 [inline]
> __bch2_trans_mark_dev_sb fs/bcachefs/buckets.c:1100 [inline]
> bch2_trans_mark_dev_sb+0x3f6/0x820 fs/bcachefs/buckets.c:1128
> bch2_trans_mark_dev_sbs_flags+0x6be/0x720 fs/bcachefs/buckets.c:1138
> bch2_fs_initialize+0xba0/0x1610 fs/bcachefs/recovery.c:1149
> bch2_fs_start+0x36d/0x610 fs/bcachefs/super.c:1042
> bch2_fs_get_tree+0xd8d/0x1740 fs/bcachefs/fs.c:2203
> vfs_get_tree+0x90/0x2b0 fs/super.c:1814
> do_new_mount+0x2be/0xb40 fs/namespace.c:3560
> do_mount fs/namespace.c:3900 [inline]
> __do_sys_mount fs/namespace.c:4111 [inline]
> __se_sys_mount+0x2d6/0x3c0 fs/namespace.c:4088
> do_syscall_x64 arch/x86/entry/common.c:52 [inline]
> do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
> RIP: 0033:0x7fcaed38e58a
> Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb a6 e8 de 1a 00 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
> RSP: 002b:00007fcaec5fde68 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
> RAX: ffffffffffffffda RBX: 00007fcaec5fdef0 RCX: 00007fcaed38e58a
> RDX: 00004000000000c0 RSI: 0000400000000180 RDI: 00007fcaec5fdeb0
> RBP: 00004000000000c0 R08: 00007fcaec5fdef0 R09: 0000000000000000
> 
>> ---
>> fs/bcachefs/six.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/fs/bcachefs/six.c b/fs/bcachefs/six.c
>> index 7e7c66a1e1a6..ccdc6d496910 100644
>> --- a/fs/bcachefs/six.c
>> +++ b/fs/bcachefs/six.c
>> @@ -873,7 +873,7 @@ void __six_lock_init(struct six_lock *lock, const char *name,
>> * failure if they wish by checking lock->readers, but generally
>> * will not want to treat it as an error.
>> */
>> - lock->readers = alloc_percpu(unsigned);
>> + lock->readers = alloc_percpu_gfp(unsigned, GFP_NOWAIT|__GFP_NOWARN);
>> }
>> #endif
>> }
>> -- 
>> 2.47.0
>> 



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] bcachefs: Use alloc_percpu_gfp to avoid deadlock
  2025-02-20 10:57   ` Alan Huang
@ 2025-02-20 12:40     ` Kent Overstreet
  2025-02-20 12:44       ` Alan Huang
  2025-02-20 17:16     ` Vlastimil Babka
  1 sibling, 1 reply; 9+ messages in thread
From: Kent Overstreet @ 2025-02-20 12:40 UTC (permalink / raw)
  To: Alan Huang
  Cc: linux-bcachefs, syzbot+fe63f377148a6371a9db, linux-mm, Tejun Heo,
	Dennis Zhou, Christoph Lameter

On Thu, Feb 20, 2025 at 06:57:32PM +0800, Alan Huang wrote:
> Ping

I really want to get this fixed in percpu...

let's leave this until we can fix it properly, this has come up before
and I don't want to just kick the can down again

(yes, that means fixing the global percpu allocation lock)

> 
> > On Feb 12, 2025, at 22:27, Kent Overstreet <kent.overstreet@linux.dev> wrote:
> > 
> > Adding pcpu people to the CC
> > 
> > On Wed, Feb 12, 2025 at 06:06:25PM +0800, Alan Huang wrote:
> >> The cycle:
> >> 
> >> CPU0: CPU1:
> >> bc->lock pcpu_alloc_mutex
> >> pcpu_alloc_mutex bc->lock
> >> 
> >> Reported-by: syzbot+fe63f377148a6371a9db@syzkaller.appspotmail.com
> >> Tested-by: syzbot+fe63f377148a6371a9db@syzkaller.appspotmail.com
> >> Signed-off-by: Alan Huang <mmpgouride@gmail.com>
> > 
> > So pcpu_alloc_mutex -> fs_reclaim?
> > 
> > That's really awkward; seems like something that might invite more
> > issues. We can apply your fix if we need to, but I want to hear with the
> > percpu people have to say first.
> > 
> > ======================================================
> > WARNING: possible circular locking dependency detected
> > 6.14.0-rc2-syzkaller-00039-g09fbf3d50205 #0 Not tainted
> > ------------------------------------------------------
> > syz.0.21/5625 is trying to acquire lock:
> > ffffffff8ea19608 (pcpu_alloc_mutex){+.+.}-{4:4}, at: pcpu_alloc_noprof+0x293/0x1760 mm/percpu.c:1782
> > 
> > but task is already holding lock:
> > ffff888051401c68 (&bc->lock){+.+.}-{4:4}, at: bch2_btree_node_mem_alloc+0x559/0x16f0 fs/bcachefs/btree_cache.c:804
> > 
> > which lock already depends on the new lock.
> > 
> > 
> > the existing dependency chain (in reverse order) is:
> > 
> > -> #2 (&bc->lock){+.+.}-{4:4}:
> >       lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5851
> >       __mutex_lock_common kernel/locking/mutex.c:585 [inline]
> >       __mutex_lock+0x19c/0x1010 kernel/locking/mutex.c:730
> >       bch2_btree_cache_scan+0x184/0xec0 fs/bcachefs/btree_cache.c:482
> >       do_shrink_slab+0x72d/0x1160 mm/shrinker.c:437
> >       shrink_slab+0x1093/0x14d0 mm/shrinker.c:664
> >       shrink_one+0x43b/0x850 mm/vmscan.c:4868
> >       shrink_many mm/vmscan.c:4929 [inline]
> >       lru_gen_shrink_node mm/vmscan.c:5007 [inline]
> >       shrink_node+0x37c5/0x3e50 mm/vmscan.c:5978
> >       kswapd_shrink_node mm/vmscan.c:6807 [inline]
> >       balance_pgdat mm/vmscan.c:6999 [inline]
> >       kswapd+0x20f3/0x3b10 mm/vmscan.c:7264
> >       kthread+0x7a9/0x920 kernel/kthread.c:464
> >       ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:148
> >       ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
> > 
> > -> #1 (fs_reclaim){+.+.}-{0:0}:
> >       lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5851
> >       __fs_reclaim_acquire mm/page_alloc.c:3853 [inline]
> >       fs_reclaim_acquire+0x88/0x130 mm/page_alloc.c:3867
> >       might_alloc include/linux/sched/mm.h:318 [inline]
> >       slab_pre_alloc_hook mm/slub.c:4066 [inline]
> >       slab_alloc_node mm/slub.c:4144 [inline]
> >       __do_kmalloc_node mm/slub.c:4293 [inline]
> >       __kmalloc_noprof+0xae/0x4c0 mm/slub.c:4306
> >       kmalloc_noprof include/linux/slab.h:905 [inline]
> >       kzalloc_noprof include/linux/slab.h:1037 [inline]
> >       pcpu_mem_zalloc mm/percpu.c:510 [inline]
> >       pcpu_alloc_chunk mm/percpu.c:1430 [inline]
> >       pcpu_create_chunk+0x57/0xbc0 mm/percpu-vm.c:338
> >       pcpu_balance_populated mm/percpu.c:2063 [inline]
> >       pcpu_balance_workfn+0xc4d/0xd40 mm/percpu.c:2200
> >       process_one_work kernel/workqueue.c:3236 [inline]
> >       process_scheduled_works+0xa66/0x1840 kernel/workqueue.c:3317
> >       worker_thread+0x870/0xd30 kernel/workqueue.c:3398
> >       kthread+0x7a9/0x920 kernel/kthread.c:464
> >       ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:148
> >       ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
> > 
> > -> #0 (pcpu_alloc_mutex){+.+.}-{4:4}:
> >       check_prev_add kernel/locking/lockdep.c:3163 [inline]
> >       check_prevs_add kernel/locking/lockdep.c:3282 [inline]
> >       validate_chain+0x18ef/0x5920 kernel/locking/lockdep.c:3906
> >       __lock_acquire+0x1397/0x2100 kernel/locking/lockdep.c:5228
> >       lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5851
> >       __mutex_lock_common kernel/locking/mutex.c:585 [inline]
> >       __mutex_lock+0x19c/0x1010 kernel/locking/mutex.c:730
> >       pcpu_alloc_noprof+0x293/0x1760 mm/percpu.c:1782
> >       __six_lock_init+0x104/0x150 fs/bcachefs/six.c:876
> >       bch2_btree_lock_init+0x38/0x100 fs/bcachefs/btree_locking.c:12
> >       bch2_btree_node_mem_alloc+0x565/0x16f0 fs/bcachefs/btree_cache.c:807
> >       __bch2_btree_node_alloc fs/bcachefs/btree_update_interior.c:304 [inline]
> >       bch2_btree_reserve_get+0x2df/0x1890 fs/bcachefs/btree_update_interior.c:532
> >       bch2_btree_update_start+0xe56/0x14e0 fs/bcachefs/btree_update_interior.c:1230
> >       bch2_btree_split_leaf+0x121/0x880 fs/bcachefs/btree_update_interior.c:1851
> >       bch2_trans_commit_error+0x212/0x1380 fs/bcachefs/btree_trans_commit.c:908
> >       __bch2_trans_commit+0x812b/0x97a0 fs/bcachefs/btree_trans_commit.c:1085
> >       bch2_trans_commit fs/bcachefs/btree_update.h:183 [inline]
> >       bch2_trans_mark_metadata_bucket+0x47a/0x17b0 fs/bcachefs/buckets.c:1043
> >       bch2_trans_mark_metadata_sectors fs/bcachefs/buckets.c:1060 [inline]
> >       __bch2_trans_mark_dev_sb fs/bcachefs/buckets.c:1100 [inline]
> >       bch2_trans_mark_dev_sb+0x3f6/0x820 fs/bcachefs/buckets.c:1128
> >       bch2_trans_mark_dev_sbs_flags+0x6be/0x720 fs/bcachefs/buckets.c:1138
> >       bch2_fs_initialize+0xba0/0x1610 fs/bcachefs/recovery.c:1149
> >       bch2_fs_start+0x36d/0x610 fs/bcachefs/super.c:1042
> >       bch2_fs_get_tree+0xd8d/0x1740 fs/bcachefs/fs.c:2203
> >       vfs_get_tree+0x90/0x2b0 fs/super.c:1814
> >       do_new_mount+0x2be/0xb40 fs/namespace.c:3560
> >       do_mount fs/namespace.c:3900 [inline]
> >       __do_sys_mount fs/namespace.c:4111 [inline]
> >       __se_sys_mount+0x2d6/0x3c0 fs/namespace.c:4088
> >       do_syscall_x64 arch/x86/entry/common.c:52 [inline]
> >       do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
> >       entry_SYSCALL_64_after_hwframe+0x77/0x7f
> > 
> > other info that might help us debug this:
> > 
> > Chain exists of:
> >  pcpu_alloc_mutex --> fs_reclaim --> &bc->lock
> > 
> > Possible unsafe locking scenario:
> > 
> >       CPU0                    CPU1
> >       ----                    ----
> >  lock(&bc->lock);
> >                               lock(fs_reclaim);
> >                               lock(&bc->lock);
> >  lock(pcpu_alloc_mutex);
> > 
> > *** DEADLOCK ***
> > 
> > 4 locks held by syz.0.21/5625:
> > #0: ffff888051400278 (&c->state_lock){+.+.}-{4:4}, at: bch2_fs_start+0x45/0x610 fs/bcachefs/super.c:1010
> > #1: ffff888051404378 (&c->btree_trans_barrier){.+.+}-{0:0}, at: srcu_lock_acquire include/linux/srcu.h:164 [inline]
> > #1: ffff888051404378 (&c->btree_trans_barrier){.+.+}-{0:0}, at: srcu_read_lock include/linux/srcu.h:256 [inline]
> > #1: ffff888051404378 (&c->btree_trans_barrier){.+.+}-{0:0}, at: __bch2_trans_get+0x7e4/0xd30 fs/bcachefs/btree_iter.c:3377
> > #2: ffff8880514266d0 (&c->gc_lock){.+.+}-{4:4}, at: bch2_btree_update_start+0x682/0x14e0 fs/bcachefs/btree_update_interior.c:1180
> > #3: ffff888051401c68 (&bc->lock){+.+.}-{4:4}, at: bch2_btree_node_mem_alloc+0x559/0x16f0 fs/bcachefs/btree_cache.c:804
> > 
> > stack backtrace:
> > CPU: 0 UID: 0 PID: 5625 Comm: syz.0.21 Not tainted 6.14.0-rc2-syzkaller-00039-g09fbf3d50205 #0
> > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
> > Call Trace:
> > <TASK>
> > __dump_stack lib/dump_stack.c:94 [inline]
> > dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120
> > print_circular_bug+0x13a/0x1b0 kernel/locking/lockdep.c:2076
> > check_noncircular+0x36a/0x4a0 kernel/locking/lockdep.c:2208
> > check_prev_add kernel/locking/lockdep.c:3163 [inline]
> > check_prevs_add kernel/locking/lockdep.c:3282 [inline]
> > validate_chain+0x18ef/0x5920 kernel/locking/lockdep.c:3906
> > __lock_acquire+0x1397/0x2100 kernel/locking/lockdep.c:5228
> > lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5851
> > __mutex_lock_common kernel/locking/mutex.c:585 [inline]
> > __mutex_lock+0x19c/0x1010 kernel/locking/mutex.c:730
> > pcpu_alloc_noprof+0x293/0x1760 mm/percpu.c:1782
> > __six_lock_init+0x104/0x150 fs/bcachefs/six.c:876
> > bch2_btree_lock_init+0x38/0x100 fs/bcachefs/btree_locking.c:12
> > bch2_btree_node_mem_alloc+0x565/0x16f0 fs/bcachefs/btree_cache.c:807
> > __bch2_btree_node_alloc fs/bcachefs/btree_update_interior.c:304 [inline]
> > bch2_btree_reserve_get+0x2df/0x1890 fs/bcachefs/btree_update_interior.c:532
> > bch2_btree_update_start+0xe56/0x14e0 fs/bcachefs/btree_update_interior.c:1230
> > bch2_btree_split_leaf+0x121/0x880 fs/bcachefs/btree_update_interior.c:1851
> > bch2_trans_commit_error+0x212/0x1380 fs/bcachefs/btree_trans_commit.c:908
> > __bch2_trans_commit+0x812b/0x97a0 fs/bcachefs/btree_trans_commit.c:1085
> > bch2_trans_commit fs/bcachefs/btree_update.h:183 [inline]
> > bch2_trans_mark_metadata_bucket+0x47a/0x17b0 fs/bcachefs/buckets.c:1043
> > bch2_trans_mark_metadata_sectors fs/bcachefs/buckets.c:1060 [inline]
> > __bch2_trans_mark_dev_sb fs/bcachefs/buckets.c:1100 [inline]
> > bch2_trans_mark_dev_sb+0x3f6/0x820 fs/bcachefs/buckets.c:1128
> > bch2_trans_mark_dev_sbs_flags+0x6be/0x720 fs/bcachefs/buckets.c:1138
> > bch2_fs_initialize+0xba0/0x1610 fs/bcachefs/recovery.c:1149
> > bch2_fs_start+0x36d/0x610 fs/bcachefs/super.c:1042
> > bch2_fs_get_tree+0xd8d/0x1740 fs/bcachefs/fs.c:2203
> > vfs_get_tree+0x90/0x2b0 fs/super.c:1814
> > do_new_mount+0x2be/0xb40 fs/namespace.c:3560
> > do_mount fs/namespace.c:3900 [inline]
> > __do_sys_mount fs/namespace.c:4111 [inline]
> > __se_sys_mount+0x2d6/0x3c0 fs/namespace.c:4088
> > do_syscall_x64 arch/x86/entry/common.c:52 [inline]
> > do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
> > entry_SYSCALL_64_after_hwframe+0x77/0x7f
> > RIP: 0033:0x7fcaed38e58a
> > Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb a6 e8 de 1a 00 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
> > RSP: 002b:00007fcaec5fde68 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
> > RAX: ffffffffffffffda RBX: 00007fcaec5fdef0 RCX: 00007fcaed38e58a
> > RDX: 00004000000000c0 RSI: 0000400000000180 RDI: 00007fcaec5fdeb0
> > RBP: 00004000000000c0 R08: 00007fcaec5fdef0 R09: 0000000000000000
> > 
> >> ---
> >> fs/bcachefs/six.c | 2 +-
> >> 1 file changed, 1 insertion(+), 1 deletion(-)
> >> 
> >> diff --git a/fs/bcachefs/six.c b/fs/bcachefs/six.c
> >> index 7e7c66a1e1a6..ccdc6d496910 100644
> >> --- a/fs/bcachefs/six.c
> >> +++ b/fs/bcachefs/six.c
> >> @@ -873,7 +873,7 @@ void __six_lock_init(struct six_lock *lock, const char *name,
> >> * failure if they wish by checking lock->readers, but generally
> >> * will not want to treat it as an error.
> >> */
> >> - lock->readers = alloc_percpu(unsigned);
> >> + lock->readers = alloc_percpu_gfp(unsigned, GFP_NOWAIT|__GFP_NOWARN);
> >> }
> >> #endif
> >> }
> >> -- 
> >> 2.47.0
> >> 
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] bcachefs: Use alloc_percpu_gfp to avoid deadlock
  2025-02-20 12:40     ` Kent Overstreet
@ 2025-02-20 12:44       ` Alan Huang
  0 siblings, 0 replies; 9+ messages in thread
From: Alan Huang @ 2025-02-20 12:44 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: linux-bcachefs, syzbot+fe63f377148a6371a9db, linux-mm, Tejun Heo,
	Dennis Zhou, Christoph Lameter



> On Feb 20, 2025, at 20:40, Kent Overstreet <kent.overstreet@linux.dev> wrote:
> 
> On Thu, Feb 20, 2025 at 06:57:32PM +0800, Alan Huang wrote:
>> Ping
> 
> I really want to get this fixed in percpu...
> 
> let's leave this until we can fix it properly, this has come up before
> and I don't want to just kick the can down again
> 
> (yes, that means fixing the global percpu allocation lock)

The ping is for the percpu people...

> 
>> 
>>> On Feb 12, 2025, at 22:27, Kent Overstreet <kent.overstreet@linux.dev> wrote:
>>> 
>>> Adding pcpu people to the CC
>>> 
>>> On Wed, Feb 12, 2025 at 06:06:25PM +0800, Alan Huang wrote:
>>>> The cycle:
>>>> 
>>>> CPU0: CPU1:
>>>> bc->lock pcpu_alloc_mutex
>>>> pcpu_alloc_mutex bc->lock
>>>> 
>>>> Reported-by: syzbot+fe63f377148a6371a9db@syzkaller.appspotmail.com
>>>> Tested-by: syzbot+fe63f377148a6371a9db@syzkaller.appspotmail.com
>>>> Signed-off-by: Alan Huang <mmpgouride@gmail.com>
>>> 
>>> So pcpu_alloc_mutex -> fs_reclaim?
>>> 
>>> That's really awkward; seems like something that might invite more
>>> issues. We can apply your fix if we need to, but I want to hear with the
>>> percpu people have to say first.
>>> 
>>> ======================================================
>>> WARNING: possible circular locking dependency detected
>>> 6.14.0-rc2-syzkaller-00039-g09fbf3d50205 #0 Not tainted
>>> ------------------------------------------------------
>>> syz.0.21/5625 is trying to acquire lock:
>>> ffffffff8ea19608 (pcpu_alloc_mutex){+.+.}-{4:4}, at: pcpu_alloc_noprof+0x293/0x1760 mm/percpu.c:1782
>>> 
>>> but task is already holding lock:
>>> ffff888051401c68 (&bc->lock){+.+.}-{4:4}, at: bch2_btree_node_mem_alloc+0x559/0x16f0 fs/bcachefs/btree_cache.c:804
>>> 
>>> which lock already depends on the new lock.
>>> 
>>> 
>>> the existing dependency chain (in reverse order) is:
>>> 
>>> -> #2 (&bc->lock){+.+.}-{4:4}:
>>>      lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5851
>>>      __mutex_lock_common kernel/locking/mutex.c:585 [inline]
>>>      __mutex_lock+0x19c/0x1010 kernel/locking/mutex.c:730
>>>      bch2_btree_cache_scan+0x184/0xec0 fs/bcachefs/btree_cache.c:482
>>>      do_shrink_slab+0x72d/0x1160 mm/shrinker.c:437
>>>      shrink_slab+0x1093/0x14d0 mm/shrinker.c:664
>>>      shrink_one+0x43b/0x850 mm/vmscan.c:4868
>>>      shrink_many mm/vmscan.c:4929 [inline]
>>>      lru_gen_shrink_node mm/vmscan.c:5007 [inline]
>>>      shrink_node+0x37c5/0x3e50 mm/vmscan.c:5978
>>>      kswapd_shrink_node mm/vmscan.c:6807 [inline]
>>>      balance_pgdat mm/vmscan.c:6999 [inline]
>>>      kswapd+0x20f3/0x3b10 mm/vmscan.c:7264
>>>      kthread+0x7a9/0x920 kernel/kthread.c:464
>>>      ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:148
>>>      ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
>>> 
>>> -> #1 (fs_reclaim){+.+.}-{0:0}:
>>>      lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5851
>>>      __fs_reclaim_acquire mm/page_alloc.c:3853 [inline]
>>>      fs_reclaim_acquire+0x88/0x130 mm/page_alloc.c:3867
>>>      might_alloc include/linux/sched/mm.h:318 [inline]
>>>      slab_pre_alloc_hook mm/slub.c:4066 [inline]
>>>      slab_alloc_node mm/slub.c:4144 [inline]
>>>      __do_kmalloc_node mm/slub.c:4293 [inline]
>>>      __kmalloc_noprof+0xae/0x4c0 mm/slub.c:4306
>>>      kmalloc_noprof include/linux/slab.h:905 [inline]
>>>      kzalloc_noprof include/linux/slab.h:1037 [inline]
>>>      pcpu_mem_zalloc mm/percpu.c:510 [inline]
>>>      pcpu_alloc_chunk mm/percpu.c:1430 [inline]
>>>      pcpu_create_chunk+0x57/0xbc0 mm/percpu-vm.c:338
>>>      pcpu_balance_populated mm/percpu.c:2063 [inline]
>>>      pcpu_balance_workfn+0xc4d/0xd40 mm/percpu.c:2200
>>>      process_one_work kernel/workqueue.c:3236 [inline]
>>>      process_scheduled_works+0xa66/0x1840 kernel/workqueue.c:3317
>>>      worker_thread+0x870/0xd30 kernel/workqueue.c:3398
>>>      kthread+0x7a9/0x920 kernel/kthread.c:464
>>>      ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:148
>>>      ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
>>> 
>>> -> #0 (pcpu_alloc_mutex){+.+.}-{4:4}:
>>>      check_prev_add kernel/locking/lockdep.c:3163 [inline]
>>>      check_prevs_add kernel/locking/lockdep.c:3282 [inline]
>>>      validate_chain+0x18ef/0x5920 kernel/locking/lockdep.c:3906
>>>      __lock_acquire+0x1397/0x2100 kernel/locking/lockdep.c:5228
>>>      lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5851
>>>      __mutex_lock_common kernel/locking/mutex.c:585 [inline]
>>>      __mutex_lock+0x19c/0x1010 kernel/locking/mutex.c:730
>>>      pcpu_alloc_noprof+0x293/0x1760 mm/percpu.c:1782
>>>      __six_lock_init+0x104/0x150 fs/bcachefs/six.c:876
>>>      bch2_btree_lock_init+0x38/0x100 fs/bcachefs/btree_locking.c:12
>>>      bch2_btree_node_mem_alloc+0x565/0x16f0 fs/bcachefs/btree_cache.c:807
>>>      __bch2_btree_node_alloc fs/bcachefs/btree_update_interior.c:304 [inline]
>>>      bch2_btree_reserve_get+0x2df/0x1890 fs/bcachefs/btree_update_interior.c:532
>>>      bch2_btree_update_start+0xe56/0x14e0 fs/bcachefs/btree_update_interior.c:1230
>>>      bch2_btree_split_leaf+0x121/0x880 fs/bcachefs/btree_update_interior.c:1851
>>>      bch2_trans_commit_error+0x212/0x1380 fs/bcachefs/btree_trans_commit.c:908
>>>      __bch2_trans_commit+0x812b/0x97a0 fs/bcachefs/btree_trans_commit.c:1085
>>>      bch2_trans_commit fs/bcachefs/btree_update.h:183 [inline]
>>>      bch2_trans_mark_metadata_bucket+0x47a/0x17b0 fs/bcachefs/buckets.c:1043
>>>      bch2_trans_mark_metadata_sectors fs/bcachefs/buckets.c:1060 [inline]
>>>      __bch2_trans_mark_dev_sb fs/bcachefs/buckets.c:1100 [inline]
>>>      bch2_trans_mark_dev_sb+0x3f6/0x820 fs/bcachefs/buckets.c:1128
>>>      bch2_trans_mark_dev_sbs_flags+0x6be/0x720 fs/bcachefs/buckets.c:1138
>>>      bch2_fs_initialize+0xba0/0x1610 fs/bcachefs/recovery.c:1149
>>>      bch2_fs_start+0x36d/0x610 fs/bcachefs/super.c:1042
>>>      bch2_fs_get_tree+0xd8d/0x1740 fs/bcachefs/fs.c:2203
>>>      vfs_get_tree+0x90/0x2b0 fs/super.c:1814
>>>      do_new_mount+0x2be/0xb40 fs/namespace.c:3560
>>>      do_mount fs/namespace.c:3900 [inline]
>>>      __do_sys_mount fs/namespace.c:4111 [inline]
>>>      __se_sys_mount+0x2d6/0x3c0 fs/namespace.c:4088
>>>      do_syscall_x64 arch/x86/entry/common.c:52 [inline]
>>>      do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
>>>      entry_SYSCALL_64_after_hwframe+0x77/0x7f
>>> 
>>> other info that might help us debug this:
>>> 
>>> Chain exists of:
>>> pcpu_alloc_mutex --> fs_reclaim --> &bc->lock
>>> 
>>> Possible unsafe locking scenario:
>>> 
>>>      CPU0                    CPU1
>>>      ----                    ----
>>> lock(&bc->lock);
>>>                              lock(fs_reclaim);
>>>                              lock(&bc->lock);
>>> lock(pcpu_alloc_mutex);
>>> 
>>> *** DEADLOCK ***
>>> 
>>> 4 locks held by syz.0.21/5625:
>>> #0: ffff888051400278 (&c->state_lock){+.+.}-{4:4}, at: bch2_fs_start+0x45/0x610 fs/bcachefs/super.c:1010
>>> #1: ffff888051404378 (&c->btree_trans_barrier){.+.+}-{0:0}, at: srcu_lock_acquire include/linux/srcu.h:164 [inline]
>>> #1: ffff888051404378 (&c->btree_trans_barrier){.+.+}-{0:0}, at: srcu_read_lock include/linux/srcu.h:256 [inline]
>>> #1: ffff888051404378 (&c->btree_trans_barrier){.+.+}-{0:0}, at: __bch2_trans_get+0x7e4/0xd30 fs/bcachefs/btree_iter.c:3377
>>> #2: ffff8880514266d0 (&c->gc_lock){.+.+}-{4:4}, at: bch2_btree_update_start+0x682/0x14e0 fs/bcachefs/btree_update_interior.c:1180
>>> #3: ffff888051401c68 (&bc->lock){+.+.}-{4:4}, at: bch2_btree_node_mem_alloc+0x559/0x16f0 fs/bcachefs/btree_cache.c:804
>>> 
>>> stack backtrace:
>>> CPU: 0 UID: 0 PID: 5625 Comm: syz.0.21 Not tainted 6.14.0-rc2-syzkaller-00039-g09fbf3d50205 #0
>>> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
>>> Call Trace:
>>> <TASK>
>>> __dump_stack lib/dump_stack.c:94 [inline]
>>> dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120
>>> print_circular_bug+0x13a/0x1b0 kernel/locking/lockdep.c:2076
>>> check_noncircular+0x36a/0x4a0 kernel/locking/lockdep.c:2208
>>> check_prev_add kernel/locking/lockdep.c:3163 [inline]
>>> check_prevs_add kernel/locking/lockdep.c:3282 [inline]
>>> validate_chain+0x18ef/0x5920 kernel/locking/lockdep.c:3906
>>> __lock_acquire+0x1397/0x2100 kernel/locking/lockdep.c:5228
>>> lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5851
>>> __mutex_lock_common kernel/locking/mutex.c:585 [inline]
>>> __mutex_lock+0x19c/0x1010 kernel/locking/mutex.c:730
>>> pcpu_alloc_noprof+0x293/0x1760 mm/percpu.c:1782
>>> __six_lock_init+0x104/0x150 fs/bcachefs/six.c:876
>>> bch2_btree_lock_init+0x38/0x100 fs/bcachefs/btree_locking.c:12
>>> bch2_btree_node_mem_alloc+0x565/0x16f0 fs/bcachefs/btree_cache.c:807
>>> __bch2_btree_node_alloc fs/bcachefs/btree_update_interior.c:304 [inline]
>>> bch2_btree_reserve_get+0x2df/0x1890 fs/bcachefs/btree_update_interior.c:532
>>> bch2_btree_update_start+0xe56/0x14e0 fs/bcachefs/btree_update_interior.c:1230
>>> bch2_btree_split_leaf+0x121/0x880 fs/bcachefs/btree_update_interior.c:1851
>>> bch2_trans_commit_error+0x212/0x1380 fs/bcachefs/btree_trans_commit.c:908
>>> __bch2_trans_commit+0x812b/0x97a0 fs/bcachefs/btree_trans_commit.c:1085
>>> bch2_trans_commit fs/bcachefs/btree_update.h:183 [inline]
>>> bch2_trans_mark_metadata_bucket+0x47a/0x17b0 fs/bcachefs/buckets.c:1043
>>> bch2_trans_mark_metadata_sectors fs/bcachefs/buckets.c:1060 [inline]
>>> __bch2_trans_mark_dev_sb fs/bcachefs/buckets.c:1100 [inline]
>>> bch2_trans_mark_dev_sb+0x3f6/0x820 fs/bcachefs/buckets.c:1128
>>> bch2_trans_mark_dev_sbs_flags+0x6be/0x720 fs/bcachefs/buckets.c:1138
>>> bch2_fs_initialize+0xba0/0x1610 fs/bcachefs/recovery.c:1149
>>> bch2_fs_start+0x36d/0x610 fs/bcachefs/super.c:1042
>>> bch2_fs_get_tree+0xd8d/0x1740 fs/bcachefs/fs.c:2203
>>> vfs_get_tree+0x90/0x2b0 fs/super.c:1814
>>> do_new_mount+0x2be/0xb40 fs/namespace.c:3560
>>> do_mount fs/namespace.c:3900 [inline]
>>> __do_sys_mount fs/namespace.c:4111 [inline]
>>> __se_sys_mount+0x2d6/0x3c0 fs/namespace.c:4088
>>> do_syscall_x64 arch/x86/entry/common.c:52 [inline]
>>> do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
>>> entry_SYSCALL_64_after_hwframe+0x77/0x7f
>>> RIP: 0033:0x7fcaed38e58a
>>> Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb a6 e8 de 1a 00 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
>>> RSP: 002b:00007fcaec5fde68 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
>>> RAX: ffffffffffffffda RBX: 00007fcaec5fdef0 RCX: 00007fcaed38e58a
>>> RDX: 00004000000000c0 RSI: 0000400000000180 RDI: 00007fcaec5fdeb0
>>> RBP: 00004000000000c0 R08: 00007fcaec5fdef0 R09: 0000000000000000
>>> 
>>>> ---
>>>> fs/bcachefs/six.c | 2 +-
>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>> 
>>>> diff --git a/fs/bcachefs/six.c b/fs/bcachefs/six.c
>>>> index 7e7c66a1e1a6..ccdc6d496910 100644
>>>> --- a/fs/bcachefs/six.c
>>>> +++ b/fs/bcachefs/six.c
>>>> @@ -873,7 +873,7 @@ void __six_lock_init(struct six_lock *lock, const char *name,
>>>> * failure if they wish by checking lock->readers, but generally
>>>> * will not want to treat it as an error.
>>>> */
>>>> - lock->readers = alloc_percpu(unsigned);
>>>> + lock->readers = alloc_percpu_gfp(unsigned, GFP_NOWAIT|__GFP_NOWARN);
>>>> }
>>>> #endif
>>>> }
>>>> -- 
>>>> 2.47.0
>>>> 
>> 



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] bcachefs: Use alloc_percpu_gfp to avoid deadlock
  2025-02-20 10:57   ` Alan Huang
  2025-02-20 12:40     ` Kent Overstreet
@ 2025-02-20 17:16     ` Vlastimil Babka
  2025-02-20 20:37       ` Kent Overstreet
  1 sibling, 1 reply; 9+ messages in thread
From: Vlastimil Babka @ 2025-02-20 17:16 UTC (permalink / raw)
  To: Alan Huang, Kent Overstreet
  Cc: linux-bcachefs, syzbot+fe63f377148a6371a9db, linux-mm, Tejun Heo,
	Dennis Zhou, Christoph Lameter, Michal Hocko

On 2/20/25 11:57, Alan Huang wrote:
> Ping
> 
>> On Feb 12, 2025, at 22:27, Kent Overstreet <kent.overstreet@linux.dev> wrote:
>> 
>> Adding pcpu people to the CC
>> 
>> On Wed, Feb 12, 2025 at 06:06:25PM +0800, Alan Huang wrote:
>>> The cycle:
>>> 
>>> CPU0: CPU1:
>>> bc->lock pcpu_alloc_mutex
>>> pcpu_alloc_mutex bc->lock
>>> 
>>> Reported-by: syzbot+fe63f377148a6371a9db@syzkaller.appspotmail.com
>>> Tested-by: syzbot+fe63f377148a6371a9db@syzkaller.appspotmail.com
>>> Signed-off-by: Alan Huang <mmpgouride@gmail.com>
>> 
>> So pcpu_alloc_mutex -> fs_reclaim?
>> 
>> That's really awkward; seems like something that might invite more
>> issues. We can apply your fix if we need to, but I want to hear with the
>> percpu people have to say first.
>> 
>> ======================================================
>> WARNING: possible circular locking dependency detected
>> 6.14.0-rc2-syzkaller-00039-g09fbf3d50205 #0 Not tainted
>> ------------------------------------------------------
>> syz.0.21/5625 is trying to acquire lock:
>> ffffffff8ea19608 (pcpu_alloc_mutex){+.+.}-{4:4}, at: pcpu_alloc_noprof+0x293/0x1760 mm/percpu.c:1782
>> 
>> but task is already holding lock:
>> ffff888051401c68 (&bc->lock){+.+.}-{4:4}, at: bch2_btree_node_mem_alloc+0x559/0x16f0 fs/bcachefs/btree_cache.c:804
>> 
>> which lock already depends on the new lock.
>> 
>> 
>> the existing dependency chain (in reverse order) is:
>> 
>> -> #2 (&bc->lock){+.+.}-{4:4}:
>>       lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5851
>>       __mutex_lock_common kernel/locking/mutex.c:585 [inline]
>>       __mutex_lock+0x19c/0x1010 kernel/locking/mutex.c:730
>>       bch2_btree_cache_scan+0x184/0xec0 fs/bcachefs/btree_cache.c:482
>>       do_shrink_slab+0x72d/0x1160 mm/shrinker.c:437
>>       shrink_slab+0x1093/0x14d0 mm/shrinker.c:664
>>       shrink_one+0x43b/0x850 mm/vmscan.c:4868
>>       shrink_many mm/vmscan.c:4929 [inline]
>>       lru_gen_shrink_node mm/vmscan.c:5007 [inline]
>>       shrink_node+0x37c5/0x3e50 mm/vmscan.c:5978
>>       kswapd_shrink_node mm/vmscan.c:6807 [inline]
>>       balance_pgdat mm/vmscan.c:6999 [inline]
>>       kswapd+0x20f3/0x3b10 mm/vmscan.c:7264
>>       kthread+0x7a9/0x920 kernel/kthread.c:464
>>       ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:148
>>       ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
>> 
>> -> #1 (fs_reclaim){+.+.}-{0:0}:
>>       lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5851
>>       __fs_reclaim_acquire mm/page_alloc.c:3853 [inline]
>>       fs_reclaim_acquire+0x88/0x130 mm/page_alloc.c:3867
>>       might_alloc include/linux/sched/mm.h:318 [inline]
>>       slab_pre_alloc_hook mm/slub.c:4066 [inline]
>>       slab_alloc_node mm/slub.c:4144 [inline]
>>       __do_kmalloc_node mm/slub.c:4293 [inline]
>>       __kmalloc_noprof+0xae/0x4c0 mm/slub.c:4306
>>       kmalloc_noprof include/linux/slab.h:905 [inline]
>>       kzalloc_noprof include/linux/slab.h:1037 [inline]
>>       pcpu_mem_zalloc mm/percpu.c:510 [inline]
>>       pcpu_alloc_chunk mm/percpu.c:1430 [inline]
>>       pcpu_create_chunk+0x57/0xbc0 mm/percpu-vm.c:338
>>       pcpu_balance_populated mm/percpu.c:2063 [inline]
>>       pcpu_balance_workfn+0xc4d/0xd40 mm/percpu.c:2200
>>       process_one_work kernel/workqueue.c:3236 [inline]
>>       process_scheduled_works+0xa66/0x1840 kernel/workqueue.c:3317
>>       worker_thread+0x870/0xd30 kernel/workqueue.c:3398
>>       kthread+0x7a9/0x920 kernel/kthread.c:464
>>       ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:148
>>       ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244

Seeing this as part of the chain (fs reclaim from a worker doing
pcpu_balance_workfn) makes me think Michal's patch could be a fix to this:

https://lore.kernel.org/all/20250206122633.167896-1-mhocko@kernel.org/

>> -> #0 (pcpu_alloc_mutex){+.+.}-{4:4}:
>>       check_prev_add kernel/locking/lockdep.c:3163 [inline]
>>       check_prevs_add kernel/locking/lockdep.c:3282 [inline]
>>       validate_chain+0x18ef/0x5920 kernel/locking/lockdep.c:3906
>>       __lock_acquire+0x1397/0x2100 kernel/locking/lockdep.c:5228
>>       lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5851
>>       __mutex_lock_common kernel/locking/mutex.c:585 [inline]
>>       __mutex_lock+0x19c/0x1010 kernel/locking/mutex.c:730
>>       pcpu_alloc_noprof+0x293/0x1760 mm/percpu.c:1782
>>       __six_lock_init+0x104/0x150 fs/bcachefs/six.c:876
>>       bch2_btree_lock_init+0x38/0x100 fs/bcachefs/btree_locking.c:12
>>       bch2_btree_node_mem_alloc+0x565/0x16f0 fs/bcachefs/btree_cache.c:807
>>       __bch2_btree_node_alloc fs/bcachefs/btree_update_interior.c:304 [inline]
>>       bch2_btree_reserve_get+0x2df/0x1890 fs/bcachefs/btree_update_interior.c:532
>>       bch2_btree_update_start+0xe56/0x14e0 fs/bcachefs/btree_update_interior.c:1230
>>       bch2_btree_split_leaf+0x121/0x880 fs/bcachefs/btree_update_interior.c:1851
>>       bch2_trans_commit_error+0x212/0x1380 fs/bcachefs/btree_trans_commit.c:908
>>       __bch2_trans_commit+0x812b/0x97a0 fs/bcachefs/btree_trans_commit.c:1085
>>       bch2_trans_commit fs/bcachefs/btree_update.h:183 [inline]
>>       bch2_trans_mark_metadata_bucket+0x47a/0x17b0 fs/bcachefs/buckets.c:1043
>>       bch2_trans_mark_metadata_sectors fs/bcachefs/buckets.c:1060 [inline]
>>       __bch2_trans_mark_dev_sb fs/bcachefs/buckets.c:1100 [inline]
>>       bch2_trans_mark_dev_sb+0x3f6/0x820 fs/bcachefs/buckets.c:1128
>>       bch2_trans_mark_dev_sbs_flags+0x6be/0x720 fs/bcachefs/buckets.c:1138
>>       bch2_fs_initialize+0xba0/0x1610 fs/bcachefs/recovery.c:1149
>>       bch2_fs_start+0x36d/0x610 fs/bcachefs/super.c:1042
>>       bch2_fs_get_tree+0xd8d/0x1740 fs/bcachefs/fs.c:2203
>>       vfs_get_tree+0x90/0x2b0 fs/super.c:1814
>>       do_new_mount+0x2be/0xb40 fs/namespace.c:3560
>>       do_mount fs/namespace.c:3900 [inline]
>>       __do_sys_mount fs/namespace.c:4111 [inline]
>>       __se_sys_mount+0x2d6/0x3c0 fs/namespace.c:4088
>>       do_syscall_x64 arch/x86/entry/common.c:52 [inline]
>>       do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
>>       entry_SYSCALL_64_after_hwframe+0x77/0x7f
>> 
>> other info that might help us debug this:
>> 
>> Chain exists of:
>>  pcpu_alloc_mutex --> fs_reclaim --> &bc->lock
>> 
>> Possible unsafe locking scenario:
>> 
>>       CPU0                    CPU1
>>       ----                    ----
>>  lock(&bc->lock);
>>                               lock(fs_reclaim);
>>                               lock(&bc->lock);
>>  lock(pcpu_alloc_mutex);
>> 
>> *** DEADLOCK ***
>> 
>> 4 locks held by syz.0.21/5625:
>> #0: ffff888051400278 (&c->state_lock){+.+.}-{4:4}, at: bch2_fs_start+0x45/0x610 fs/bcachefs/super.c:1010
>> #1: ffff888051404378 (&c->btree_trans_barrier){.+.+}-{0:0}, at: srcu_lock_acquire include/linux/srcu.h:164 [inline]
>> #1: ffff888051404378 (&c->btree_trans_barrier){.+.+}-{0:0}, at: srcu_read_lock include/linux/srcu.h:256 [inline]
>> #1: ffff888051404378 (&c->btree_trans_barrier){.+.+}-{0:0}, at: __bch2_trans_get+0x7e4/0xd30 fs/bcachefs/btree_iter.c:3377
>> #2: ffff8880514266d0 (&c->gc_lock){.+.+}-{4:4}, at: bch2_btree_update_start+0x682/0x14e0 fs/bcachefs/btree_update_interior.c:1180
>> #3: ffff888051401c68 (&bc->lock){+.+.}-{4:4}, at: bch2_btree_node_mem_alloc+0x559/0x16f0 fs/bcachefs/btree_cache.c:804
>> 
>> stack backtrace:
>> CPU: 0 UID: 0 PID: 5625 Comm: syz.0.21 Not tainted 6.14.0-rc2-syzkaller-00039-g09fbf3d50205 #0
>> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
>> Call Trace:
>> <TASK>
>> __dump_stack lib/dump_stack.c:94 [inline]
>> dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120
>> print_circular_bug+0x13a/0x1b0 kernel/locking/lockdep.c:2076
>> check_noncircular+0x36a/0x4a0 kernel/locking/lockdep.c:2208
>> check_prev_add kernel/locking/lockdep.c:3163 [inline]
>> check_prevs_add kernel/locking/lockdep.c:3282 [inline]
>> validate_chain+0x18ef/0x5920 kernel/locking/lockdep.c:3906
>> __lock_acquire+0x1397/0x2100 kernel/locking/lockdep.c:5228
>> lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5851
>> __mutex_lock_common kernel/locking/mutex.c:585 [inline]
>> __mutex_lock+0x19c/0x1010 kernel/locking/mutex.c:730
>> pcpu_alloc_noprof+0x293/0x1760 mm/percpu.c:1782
>> __six_lock_init+0x104/0x150 fs/bcachefs/six.c:876
>> bch2_btree_lock_init+0x38/0x100 fs/bcachefs/btree_locking.c:12
>> bch2_btree_node_mem_alloc+0x565/0x16f0 fs/bcachefs/btree_cache.c:807
>> __bch2_btree_node_alloc fs/bcachefs/btree_update_interior.c:304 [inline]
>> bch2_btree_reserve_get+0x2df/0x1890 fs/bcachefs/btree_update_interior.c:532
>> bch2_btree_update_start+0xe56/0x14e0 fs/bcachefs/btree_update_interior.c:1230
>> bch2_btree_split_leaf+0x121/0x880 fs/bcachefs/btree_update_interior.c:1851
>> bch2_trans_commit_error+0x212/0x1380 fs/bcachefs/btree_trans_commit.c:908
>> __bch2_trans_commit+0x812b/0x97a0 fs/bcachefs/btree_trans_commit.c:1085
>> bch2_trans_commit fs/bcachefs/btree_update.h:183 [inline]
>> bch2_trans_mark_metadata_bucket+0x47a/0x17b0 fs/bcachefs/buckets.c:1043
>> bch2_trans_mark_metadata_sectors fs/bcachefs/buckets.c:1060 [inline]
>> __bch2_trans_mark_dev_sb fs/bcachefs/buckets.c:1100 [inline]
>> bch2_trans_mark_dev_sb+0x3f6/0x820 fs/bcachefs/buckets.c:1128
>> bch2_trans_mark_dev_sbs_flags+0x6be/0x720 fs/bcachefs/buckets.c:1138
>> bch2_fs_initialize+0xba0/0x1610 fs/bcachefs/recovery.c:1149
>> bch2_fs_start+0x36d/0x610 fs/bcachefs/super.c:1042
>> bch2_fs_get_tree+0xd8d/0x1740 fs/bcachefs/fs.c:2203
>> vfs_get_tree+0x90/0x2b0 fs/super.c:1814
>> do_new_mount+0x2be/0xb40 fs/namespace.c:3560
>> do_mount fs/namespace.c:3900 [inline]
>> __do_sys_mount fs/namespace.c:4111 [inline]
>> __se_sys_mount+0x2d6/0x3c0 fs/namespace.c:4088
>> do_syscall_x64 arch/x86/entry/common.c:52 [inline]
>> do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
>> entry_SYSCALL_64_after_hwframe+0x77/0x7f
>> RIP: 0033:0x7fcaed38e58a
>> Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb a6 e8 de 1a 00 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
>> RSP: 002b:00007fcaec5fde68 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
>> RAX: ffffffffffffffda RBX: 00007fcaec5fdef0 RCX: 00007fcaed38e58a
>> RDX: 00004000000000c0 RSI: 0000400000000180 RDI: 00007fcaec5fdeb0
>> RBP: 00004000000000c0 R08: 00007fcaec5fdef0 R09: 0000000000000000
>> 
>>> ---
>>> fs/bcachefs/six.c | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>> 
>>> diff --git a/fs/bcachefs/six.c b/fs/bcachefs/six.c
>>> index 7e7c66a1e1a6..ccdc6d496910 100644
>>> --- a/fs/bcachefs/six.c
>>> +++ b/fs/bcachefs/six.c
>>> @@ -873,7 +873,7 @@ void __six_lock_init(struct six_lock *lock, const char *name,
>>> * failure if they wish by checking lock->readers, but generally
>>> * will not want to treat it as an error.
>>> */
>>> - lock->readers = alloc_percpu(unsigned);
>>> + lock->readers = alloc_percpu_gfp(unsigned, GFP_NOWAIT|__GFP_NOWARN);
>>> }
>>> #endif
>>> }
>>> -- 
>>> 2.47.0
>>> 
> 
> 



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] bcachefs: Use alloc_percpu_gfp to avoid deadlock
  2025-02-20 17:16     ` Vlastimil Babka
@ 2025-02-20 20:37       ` Kent Overstreet
  2025-02-21  2:46         ` Dennis Zhou
  0 siblings, 1 reply; 9+ messages in thread
From: Kent Overstreet @ 2025-02-20 20:37 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Alan Huang, linux-bcachefs, syzbot+fe63f377148a6371a9db,
	linux-mm, Tejun Heo, Dennis Zhou, Christoph Lameter,
	Michal Hocko

On Thu, Feb 20, 2025 at 06:16:43PM +0100, Vlastimil Babka wrote:
> On 2/20/25 11:57, Alan Huang wrote:
> > Ping
> > 
> >> On Feb 12, 2025, at 22:27, Kent Overstreet <kent.overstreet@linux.dev> wrote:
> >> 
> >> Adding pcpu people to the CC
> >> 
> >> On Wed, Feb 12, 2025 at 06:06:25PM +0800, Alan Huang wrote:
> >>> The cycle:
> >>> 
> >>> CPU0: CPU1:
> >>> bc->lock pcpu_alloc_mutex
> >>> pcpu_alloc_mutex bc->lock
> >>> 
> >>> Reported-by: syzbot+fe63f377148a6371a9db@syzkaller.appspotmail.com
> >>> Tested-by: syzbot+fe63f377148a6371a9db@syzkaller.appspotmail.com
> >>> Signed-off-by: Alan Huang <mmpgouride@gmail.com>
> >> 
> >> So pcpu_alloc_mutex -> fs_reclaim?
> >> 
> >> That's really awkward; seems like something that might invite more
> >> issues. We can apply your fix if we need to, but I want to hear with the
> >> percpu people have to say first.
> >> 
> >> ======================================================
> >> WARNING: possible circular locking dependency detected
> >> 6.14.0-rc2-syzkaller-00039-g09fbf3d50205 #0 Not tainted
> >> ------------------------------------------------------
> >> syz.0.21/5625 is trying to acquire lock:
> >> ffffffff8ea19608 (pcpu_alloc_mutex){+.+.}-{4:4}, at: pcpu_alloc_noprof+0x293/0x1760 mm/percpu.c:1782
> >> 
> >> but task is already holding lock:
> >> ffff888051401c68 (&bc->lock){+.+.}-{4:4}, at: bch2_btree_node_mem_alloc+0x559/0x16f0 fs/bcachefs/btree_cache.c:804
> >> 
> >> which lock already depends on the new lock.
> >> 
> >> 
> >> the existing dependency chain (in reverse order) is:
> >> 
> >> -> #2 (&bc->lock){+.+.}-{4:4}:
> >>       lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5851
> >>       __mutex_lock_common kernel/locking/mutex.c:585 [inline]
> >>       __mutex_lock+0x19c/0x1010 kernel/locking/mutex.c:730
> >>       bch2_btree_cache_scan+0x184/0xec0 fs/bcachefs/btree_cache.c:482
> >>       do_shrink_slab+0x72d/0x1160 mm/shrinker.c:437
> >>       shrink_slab+0x1093/0x14d0 mm/shrinker.c:664
> >>       shrink_one+0x43b/0x850 mm/vmscan.c:4868
> >>       shrink_many mm/vmscan.c:4929 [inline]
> >>       lru_gen_shrink_node mm/vmscan.c:5007 [inline]
> >>       shrink_node+0x37c5/0x3e50 mm/vmscan.c:5978
> >>       kswapd_shrink_node mm/vmscan.c:6807 [inline]
> >>       balance_pgdat mm/vmscan.c:6999 [inline]
> >>       kswapd+0x20f3/0x3b10 mm/vmscan.c:7264
> >>       kthread+0x7a9/0x920 kernel/kthread.c:464
> >>       ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:148
> >>       ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
> >> 
> >> -> #1 (fs_reclaim){+.+.}-{0:0}:
> >>       lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5851
> >>       __fs_reclaim_acquire mm/page_alloc.c:3853 [inline]
> >>       fs_reclaim_acquire+0x88/0x130 mm/page_alloc.c:3867
> >>       might_alloc include/linux/sched/mm.h:318 [inline]
> >>       slab_pre_alloc_hook mm/slub.c:4066 [inline]
> >>       slab_alloc_node mm/slub.c:4144 [inline]
> >>       __do_kmalloc_node mm/slub.c:4293 [inline]
> >>       __kmalloc_noprof+0xae/0x4c0 mm/slub.c:4306
> >>       kmalloc_noprof include/linux/slab.h:905 [inline]
> >>       kzalloc_noprof include/linux/slab.h:1037 [inline]
> >>       pcpu_mem_zalloc mm/percpu.c:510 [inline]
> >>       pcpu_alloc_chunk mm/percpu.c:1430 [inline]
> >>       pcpu_create_chunk+0x57/0xbc0 mm/percpu-vm.c:338
> >>       pcpu_balance_populated mm/percpu.c:2063 [inline]
> >>       pcpu_balance_workfn+0xc4d/0xd40 mm/percpu.c:2200
> >>       process_one_work kernel/workqueue.c:3236 [inline]
> >>       process_scheduled_works+0xa66/0x1840 kernel/workqueue.c:3317
> >>       worker_thread+0x870/0xd30 kernel/workqueue.c:3398
> >>       kthread+0x7a9/0x920 kernel/kthread.c:464
> >>       ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:148
> >>       ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
> 
> Seeing this as part of the chain (fs reclaim from a worker doing
> pcpu_balance_workfn) makes me think Michal's patch could be a fix to this:
> 
> https://lore.kernel.org/all/20250206122633.167896-1-mhocko@kernel.org/

Thanks for the link - that does look like just the thing.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] bcachefs: Use alloc_percpu_gfp to avoid deadlock
  2025-02-20 20:37       ` Kent Overstreet
@ 2025-02-21  2:46         ` Dennis Zhou
  2025-02-21  7:21           ` Vlastimil Babka
  2025-02-21 19:44           ` Alan Huang
  0 siblings, 2 replies; 9+ messages in thread
From: Dennis Zhou @ 2025-02-21  2:46 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Vlastimil Babka, Alan Huang, linux-bcachefs,
	syzbot+fe63f377148a6371a9db, linux-mm, Tejun Heo,
	Christoph Lameter, Michal Hocko

Hello,

On Thu, Feb 20, 2025 at 03:37:26PM -0500, Kent Overstreet wrote:
> On Thu, Feb 20, 2025 at 06:16:43PM +0100, Vlastimil Babka wrote:
> > On 2/20/25 11:57, Alan Huang wrote:
> > > Ping
> > > 
> > >> On Feb 12, 2025, at 22:27, Kent Overstreet <kent.overstreet@linux.dev> wrote:
> > >> 
> > >> Adding pcpu people to the CC
> > >> 
> > >> On Wed, Feb 12, 2025 at 06:06:25PM +0800, Alan Huang wrote:
> > >>> The cycle:
> > >>> 
> > >>> CPU0: CPU1:
> > >>> bc->lock pcpu_alloc_mutex
> > >>> pcpu_alloc_mutex bc->lock
> > >>> 
> > >>> Reported-by: syzbot+fe63f377148a6371a9db@syzkaller.appspotmail.com
> > >>> Tested-by: syzbot+fe63f377148a6371a9db@syzkaller.appspotmail.com
> > >>> Signed-off-by: Alan Huang <mmpgouride@gmail.com>
> > >> 
> > >> So pcpu_alloc_mutex -> fs_reclaim?
> > >> 
> > >> That's really awkward; seems like something that might invite more
> > >> issues. We can apply your fix if we need to, but I want to hear with the
> > >> percpu people have to say first.
> > >> 
> > >> ======================================================
> > >> WARNING: possible circular locking dependency detected
> > >> 6.14.0-rc2-syzkaller-00039-g09fbf3d50205 #0 Not tainted
> > >> ------------------------------------------------------
> > >> syz.0.21/5625 is trying to acquire lock:
> > >> ffffffff8ea19608 (pcpu_alloc_mutex){+.+.}-{4:4}, at: pcpu_alloc_noprof+0x293/0x1760 mm/percpu.c:1782
> > >> 
> > >> but task is already holding lock:
> > >> ffff888051401c68 (&bc->lock){+.+.}-{4:4}, at: bch2_btree_node_mem_alloc+0x559/0x16f0 fs/bcachefs/btree_cache.c:804
> > >> 
> > >> which lock already depends on the new lock.
> > >> 
> > >> 
> > >> the existing dependency chain (in reverse order) is:
> > >> 
> > >> -> #2 (&bc->lock){+.+.}-{4:4}:
> > >>       lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5851
> > >>       __mutex_lock_common kernel/locking/mutex.c:585 [inline]
> > >>       __mutex_lock+0x19c/0x1010 kernel/locking/mutex.c:730
> > >>       bch2_btree_cache_scan+0x184/0xec0 fs/bcachefs/btree_cache.c:482
> > >>       do_shrink_slab+0x72d/0x1160 mm/shrinker.c:437
> > >>       shrink_slab+0x1093/0x14d0 mm/shrinker.c:664
> > >>       shrink_one+0x43b/0x850 mm/vmscan.c:4868
> > >>       shrink_many mm/vmscan.c:4929 [inline]
> > >>       lru_gen_shrink_node mm/vmscan.c:5007 [inline]
> > >>       shrink_node+0x37c5/0x3e50 mm/vmscan.c:5978
> > >>       kswapd_shrink_node mm/vmscan.c:6807 [inline]
> > >>       balance_pgdat mm/vmscan.c:6999 [inline]
> > >>       kswapd+0x20f3/0x3b10 mm/vmscan.c:7264
> > >>       kthread+0x7a9/0x920 kernel/kthread.c:464
> > >>       ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:148
> > >>       ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
> > >> 
> > >> -> #1 (fs_reclaim){+.+.}-{0:0}:
> > >>       lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5851
> > >>       __fs_reclaim_acquire mm/page_alloc.c:3853 [inline]
> > >>       fs_reclaim_acquire+0x88/0x130 mm/page_alloc.c:3867
> > >>       might_alloc include/linux/sched/mm.h:318 [inline]
> > >>       slab_pre_alloc_hook mm/slub.c:4066 [inline]
> > >>       slab_alloc_node mm/slub.c:4144 [inline]
> > >>       __do_kmalloc_node mm/slub.c:4293 [inline]
> > >>       __kmalloc_noprof+0xae/0x4c0 mm/slub.c:4306
> > >>       kmalloc_noprof include/linux/slab.h:905 [inline]
> > >>       kzalloc_noprof include/linux/slab.h:1037 [inline]
> > >>       pcpu_mem_zalloc mm/percpu.c:510 [inline]
> > >>       pcpu_alloc_chunk mm/percpu.c:1430 [inline]
> > >>       pcpu_create_chunk+0x57/0xbc0 mm/percpu-vm.c:338
> > >>       pcpu_balance_populated mm/percpu.c:2063 [inline]
> > >>       pcpu_balance_workfn+0xc4d/0xd40 mm/percpu.c:2200
> > >>       process_one_work kernel/workqueue.c:3236 [inline]
> > >>       process_scheduled_works+0xa66/0x1840 kernel/workqueue.c:3317
> > >>       worker_thread+0x870/0xd30 kernel/workqueue.c:3398
> > >>       kthread+0x7a9/0x920 kernel/kthread.c:464
> > >>       ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:148
> > >>       ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
> > 
> > Seeing this as part of the chain (fs reclaim from a worker doing
> > pcpu_balance_workfn) makes me think Michal's patch could be a fix to this:
> > 
> > https://lore.kernel.org/all/20250206122633.167896-1-mhocko@kernel.org/
> 
> Thanks for the link - that does look like just the thing.

Sorry I missed the first email asking to weigh in.

Michal's problem is a little bit different than what's happening here.
He's having an issue where a alloc_percpu_gfp(NOFS/NOIO) is considered
atomic and failing during probing. This is because we don't have enough
percpu memory backed to fulfill the "atomic" requests.

Historically we've considered any allocation that's not GFP_KERNEL to be
atomic. Here it seems like the alloc_percpu() behind the bc->lock()
should have been an "atomic" allocation to prevent the lock cycle?

Thanks,
Dennis


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] bcachefs: Use alloc_percpu_gfp to avoid deadlock
  2025-02-21  2:46         ` Dennis Zhou
@ 2025-02-21  7:21           ` Vlastimil Babka
  2025-02-21 19:44           ` Alan Huang
  1 sibling, 0 replies; 9+ messages in thread
From: Vlastimil Babka @ 2025-02-21  7:21 UTC (permalink / raw)
  To: Dennis Zhou, Kent Overstreet
  Cc: Alan Huang, linux-bcachefs, syzbot+fe63f377148a6371a9db,
	linux-mm, Tejun Heo, Christoph Lameter, Michal Hocko

On 2/21/25 03:46, Dennis Zhou wrote:
> Hello,
> 
> On Thu, Feb 20, 2025 at 03:37:26PM -0500, Kent Overstreet wrote:
>> On Thu, Feb 20, 2025 at 06:16:43PM +0100, Vlastimil Babka wrote:
>> > On 2/20/25 11:57, Alan Huang wrote:
>> > > Ping
>> > > 
>> > >> On Feb 12, 2025, at 22:27, Kent Overstreet <kent.overstreet@linux.dev> wrote:
>> > >> 
>> > >> Adding pcpu people to the CC
>> > >> 
>> > >> On Wed, Feb 12, 2025 at 06:06:25PM +0800, Alan Huang wrote:
>> > >>> The cycle:
>> > >>> 
>> > >>> CPU0: CPU1:
>> > >>> bc->lock pcpu_alloc_mutex
>> > >>> pcpu_alloc_mutex bc->lock
>> > >>> 
>> > >>> Reported-by: syzbot+fe63f377148a6371a9db@syzkaller.appspotmail.com
>> > >>> Tested-by: syzbot+fe63f377148a6371a9db@syzkaller.appspotmail.com
>> > >>> Signed-off-by: Alan Huang <mmpgouride@gmail.com>
>> > >> 
>> > >> So pcpu_alloc_mutex -> fs_reclaim?
>> > >> 
>> > >> That's really awkward; seems like something that might invite more
>> > >> issues. We can apply your fix if we need to, but I want to hear with the
>> > >> percpu people have to say first.
>> > >> 
>> > >> ======================================================
>> > >> WARNING: possible circular locking dependency detected
>> > >> 6.14.0-rc2-syzkaller-00039-g09fbf3d50205 #0 Not tainted
>> > >> ------------------------------------------------------
>> > >> syz.0.21/5625 is trying to acquire lock:
>> > >> ffffffff8ea19608 (pcpu_alloc_mutex){+.+.}-{4:4}, at: pcpu_alloc_noprof+0x293/0x1760 mm/percpu.c:1782
>> > >> 
>> > >> but task is already holding lock:
>> > >> ffff888051401c68 (&bc->lock){+.+.}-{4:4}, at: bch2_btree_node_mem_alloc+0x559/0x16f0 fs/bcachefs/btree_cache.c:804
>> > >> 
>> > >> which lock already depends on the new lock.
>> > >> 
>> > >> 
>> > >> the existing dependency chain (in reverse order) is:
>> > >> 
>> > >> -> #2 (&bc->lock){+.+.}-{4:4}:
>> > >>       lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5851
>> > >>       __mutex_lock_common kernel/locking/mutex.c:585 [inline]
>> > >>       __mutex_lock+0x19c/0x1010 kernel/locking/mutex.c:730
>> > >>       bch2_btree_cache_scan+0x184/0xec0 fs/bcachefs/btree_cache.c:482
>> > >>       do_shrink_slab+0x72d/0x1160 mm/shrinker.c:437
>> > >>       shrink_slab+0x1093/0x14d0 mm/shrinker.c:664
>> > >>       shrink_one+0x43b/0x850 mm/vmscan.c:4868
>> > >>       shrink_many mm/vmscan.c:4929 [inline]
>> > >>       lru_gen_shrink_node mm/vmscan.c:5007 [inline]
>> > >>       shrink_node+0x37c5/0x3e50 mm/vmscan.c:5978
>> > >>       kswapd_shrink_node mm/vmscan.c:6807 [inline]
>> > >>       balance_pgdat mm/vmscan.c:6999 [inline]
>> > >>       kswapd+0x20f3/0x3b10 mm/vmscan.c:7264
>> > >>       kthread+0x7a9/0x920 kernel/kthread.c:464
>> > >>       ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:148
>> > >>       ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
>> > >> 
>> > >> -> #1 (fs_reclaim){+.+.}-{0:0}:
>> > >>       lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5851
>> > >>       __fs_reclaim_acquire mm/page_alloc.c:3853 [inline]
>> > >>       fs_reclaim_acquire+0x88/0x130 mm/page_alloc.c:3867
>> > >>       might_alloc include/linux/sched/mm.h:318 [inline]
>> > >>       slab_pre_alloc_hook mm/slub.c:4066 [inline]
>> > >>       slab_alloc_node mm/slub.c:4144 [inline]
>> > >>       __do_kmalloc_node mm/slub.c:4293 [inline]
>> > >>       __kmalloc_noprof+0xae/0x4c0 mm/slub.c:4306
>> > >>       kmalloc_noprof include/linux/slab.h:905 [inline]
>> > >>       kzalloc_noprof include/linux/slab.h:1037 [inline]
>> > >>       pcpu_mem_zalloc mm/percpu.c:510 [inline]
>> > >>       pcpu_alloc_chunk mm/percpu.c:1430 [inline]
>> > >>       pcpu_create_chunk+0x57/0xbc0 mm/percpu-vm.c:338
>> > >>       pcpu_balance_populated mm/percpu.c:2063 [inline]
>> > >>       pcpu_balance_workfn+0xc4d/0xd40 mm/percpu.c:2200
>> > >>       process_one_work kernel/workqueue.c:3236 [inline]
>> > >>       process_scheduled_works+0xa66/0x1840 kernel/workqueue.c:3317
>> > >>       worker_thread+0x870/0xd30 kernel/workqueue.c:3398
>> > >>       kthread+0x7a9/0x920 kernel/kthread.c:464
>> > >>       ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:148
>> > >>       ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
>> > 
>> > Seeing this as part of the chain (fs reclaim from a worker doing
>> > pcpu_balance_workfn) makes me think Michal's patch could be a fix to this:
>> > 
>> > https://lore.kernel.org/all/20250206122633.167896-1-mhocko@kernel.org/
>> 
>> Thanks for the link - that does look like just the thing.
> 
> Sorry I missed the first email asking to weigh in.
> 
> Michal's problem is a little bit different than what's happening here.

Yes, but it's related enough. He mentions commit 28307d938fb2 and there you
find a similar kind of lockdep splat.

> He's having an issue where a alloc_percpu_gfp(NOFS/NOIO) is considered
> atomic and failing during probing. This is because we don't have enough
> percpu memory backed to fulfill the "atomic" requests.

That, and we don't allow NOFS/NOIO to take the pcpu_alloc_mutex to avoid
deadlock with pcpu_balance_workfn taking it and then doing __GFP_FS reclaim.

> Historically we've considered any allocation that's not GFP_KERNEL to be
> atomic. Here it seems like the alloc_percpu() behind the bc->lock()
> should have been an "atomic" allocation to prevent the lock cycle?

That's what the original mail/patch in this thread suggested:
https://lore.kernel.org/all/20250212100625.55860-1-mmpgouride@gmail.com/
Note it proposes GFP_NOWAIT, possibly GFP_NOFS would be enough, but then the
current implementation could would make it "atomic" anyway.

But then it could end up failing like the allocations that motivated
Michal's patch? So with Michal's approach we can avoid having to weaken
pcpu_alloc() callers like this. Yes it's counter-intuitive that we weaken a
kworker context instead, which normally has no restrictions. But it's not
weakened (NOIO) nearly as much as pcpu_alloc() users that are effectively
atomic when they can't take pcpu_alloc mutex at all.

> Thanks,
> Dennis
> 



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] bcachefs: Use alloc_percpu_gfp to avoid deadlock
  2025-02-21  2:46         ` Dennis Zhou
  2025-02-21  7:21           ` Vlastimil Babka
@ 2025-02-21 19:44           ` Alan Huang
  1 sibling, 0 replies; 9+ messages in thread
From: Alan Huang @ 2025-02-21 19:44 UTC (permalink / raw)
  To: Dennis Zhou
  Cc: Kent Overstreet, Vlastimil Babka, linux-bcachefs,
	syzbot+fe63f377148a6371a9db, linux-mm, Tejun Heo,
	Christoph Lameter, Michal Hocko

On Feb 21, 2025, at 10:46, Dennis Zhou <dennis@kernel.org> wrote:
> 
> Hello,
> 
> On Thu, Feb 20, 2025 at 03:37:26PM -0500, Kent Overstreet wrote:
>> On Thu, Feb 20, 2025 at 06:16:43PM +0100, Vlastimil Babka wrote:
>>> On 2/20/25 11:57, Alan Huang wrote:
>>>> Ping
>>>> 
>>>>> On Feb 12, 2025, at 22:27, Kent Overstreet <kent.overstreet@linux.dev> wrote:
>>>>> 
>>>>> Adding pcpu people to the CC
>>>>> 
>>>>> On Wed, Feb 12, 2025 at 06:06:25PM +0800, Alan Huang wrote:
>>>>>> The cycle:
>>>>>> 
>>>>>> CPU0: CPU1:
>>>>>> bc->lock pcpu_alloc_mutex
>>>>>> pcpu_alloc_mutex bc->lock
>>>>>> 
>>>>>> Reported-by: syzbot+fe63f377148a6371a9db@syzkaller.appspotmail.com
>>>>>> Tested-by: syzbot+fe63f377148a6371a9db@syzkaller.appspotmail.com
>>>>>> Signed-off-by: Alan Huang <mmpgouride@gmail.com>
>>>>> 
>>>>> So pcpu_alloc_mutex -> fs_reclaim?
>>>>> 
>>>>> That's really awkward; seems like something that might invite more
>>>>> issues. We can apply your fix if we need to, but I want to hear with the
>>>>> percpu people have to say first.
>>>>> 
>>>>> ======================================================
>>>>> WARNING: possible circular locking dependency detected
>>>>> 6.14.0-rc2-syzkaller-00039-g09fbf3d50205 #0 Not tainted
>>>>> ------------------------------------------------------
>>>>> syz.0.21/5625 is trying to acquire lock:
>>>>> ffffffff8ea19608 (pcpu_alloc_mutex){+.+.}-{4:4}, at: pcpu_alloc_noprof+0x293/0x1760 mm/percpu.c:1782
>>>>> 
>>>>> but task is already holding lock:
>>>>> ffff888051401c68 (&bc->lock){+.+.}-{4:4}, at: bch2_btree_node_mem_alloc+0x559/0x16f0 fs/bcachefs/btree_cache.c:804
>>>>> 
>>>>> which lock already depends on the new lock.
>>>>> 
>>>>> 
>>>>> the existing dependency chain (in reverse order) is:
>>>>> 
>>>>> -> #2 (&bc->lock){+.+.}-{4:4}:
>>>>>      lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5851
>>>>>      __mutex_lock_common kernel/locking/mutex.c:585 [inline]
>>>>>      __mutex_lock+0x19c/0x1010 kernel/locking/mutex.c:730
>>>>>      bch2_btree_cache_scan+0x184/0xec0 fs/bcachefs/btree_cache.c:482
>>>>>      do_shrink_slab+0x72d/0x1160 mm/shrinker.c:437
>>>>>      shrink_slab+0x1093/0x14d0 mm/shrinker.c:664
>>>>>      shrink_one+0x43b/0x850 mm/vmscan.c:4868
>>>>>      shrink_many mm/vmscan.c:4929 [inline]
>>>>>      lru_gen_shrink_node mm/vmscan.c:5007 [inline]
>>>>>      shrink_node+0x37c5/0x3e50 mm/vmscan.c:5978
>>>>>      kswapd_shrink_node mm/vmscan.c:6807 [inline]
>>>>>      balance_pgdat mm/vmscan.c:6999 [inline]
>>>>>      kswapd+0x20f3/0x3b10 mm/vmscan.c:7264
>>>>>      kthread+0x7a9/0x920 kernel/kthread.c:464
>>>>>      ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:148
>>>>>      ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
>>>>> 
>>>>> -> #1 (fs_reclaim){+.+.}-{0:0}:
>>>>>      lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5851
>>>>>      __fs_reclaim_acquire mm/page_alloc.c:3853 [inline]
>>>>>      fs_reclaim_acquire+0x88/0x130 mm/page_alloc.c:3867
>>>>>      might_alloc include/linux/sched/mm.h:318 [inline]
>>>>>      slab_pre_alloc_hook mm/slub.c:4066 [inline]
>>>>>      slab_alloc_node mm/slub.c:4144 [inline]
>>>>>      __do_kmalloc_node mm/slub.c:4293 [inline]
>>>>>      __kmalloc_noprof+0xae/0x4c0 mm/slub.c:4306
>>>>>      kmalloc_noprof include/linux/slab.h:905 [inline]
>>>>>      kzalloc_noprof include/linux/slab.h:1037 [inline]
>>>>>      pcpu_mem_zalloc mm/percpu.c:510 [inline]
>>>>>      pcpu_alloc_chunk mm/percpu.c:1430 [inline]
>>>>>      pcpu_create_chunk+0x57/0xbc0 mm/percpu-vm.c:338
>>>>>      pcpu_balance_populated mm/percpu.c:2063 [inline]
>>>>>      pcpu_balance_workfn+0xc4d/0xd40 mm/percpu.c:2200
>>>>>      process_one_work kernel/workqueue.c:3236 [inline]
>>>>>      process_scheduled_works+0xa66/0x1840 kernel/workqueue.c:3317
>>>>>      worker_thread+0x870/0xd30 kernel/workqueue.c:3398
>>>>>      kthread+0x7a9/0x920 kernel/kthread.c:464
>>>>>      ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:148
>>>>>      ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
>>> 
>>> Seeing this as part of the chain (fs reclaim from a worker doing
>>> pcpu_balance_workfn) makes me think Michal's patch could be a fix to this:
>>> 
>>> https://lore.kernel.org/all/20250206122633.167896-1-mhocko@kernel.org/
>> 
>> Thanks for the link - that does look like just the thing.
> 
> Sorry I missed the first email asking to weigh in.
> 
> Michal's problem is a little bit different than what's happening here.
> He's having an issue where a alloc_percpu_gfp(NOFS/NOIO) is considered
> atomic and failing during probing. This is because we don't have enough
> percpu memory backed to fulfill the "atomic" requests.
> 
> Historically we've considered any allocation that's not GFP_KERNEL to be
> atomic. Here it seems like the alloc_percpu() behind the bc->lock()
> should have been an "atomic" allocation to prevent the lock cycle?

I think so, if I understand it correctly, NOFS/NOIO could invoke the shrinker, so we 
can lock bc->lock again. And I think we should not rely on the implementation of 
alloc_percpu_gfp, but the GFP flags instead.

Correct me if I'm wrong.

> Thanks,
> Dennis



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-02-21 19:44 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20250212100625.55860-1-mmpgouride@gmail.com>
2025-02-12 14:27 ` [PATCH] bcachefs: Use alloc_percpu_gfp to avoid deadlock Kent Overstreet
2025-02-20 10:57   ` Alan Huang
2025-02-20 12:40     ` Kent Overstreet
2025-02-20 12:44       ` Alan Huang
2025-02-20 17:16     ` Vlastimil Babka
2025-02-20 20:37       ` Kent Overstreet
2025-02-21  2:46         ` Dennis Zhou
2025-02-21  7:21           ` Vlastimil Babka
2025-02-21 19:44           ` Alan Huang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox