scheduling while atomic on rc3 - migration + buffer heads

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* scheduling while atomic on rc3 - migration + buffer heads
@ 2025-04-21 15:14 Kent Overstreet
  2025-04-21 15:47 ` Raghavendra K T
  2025-04-21 17:27 ` Darrick J. Wong
  0 siblings, 2 replies; 4+ messages in thread
From: Kent Overstreet @ 2025-04-21 15:14 UTC (permalink / raw)
  To: linux-mm, linux-ext4, linux-fsdevel

This just popped up in one of my test runs.

Given that it's buffer heads, it has to be the ext4 root filesystem, not
bcachefs.

00465 ========= TEST   lz4_buffered
00465 
00465 WATCHDOG 360
00466 bcachefs (vdb): starting version 1.25: extent_flags opts=errors=panic,compression=lz4
00466 bcachefs (vdb): initializing new filesystem
00466 bcachefs (vdb): going read-write
00466 bcachefs (vdb): marking superblocks
00466 bcachefs (vdb): initializing freespace
00466 bcachefs (vdb): done initializing freespace
00466 bcachefs (vdb): reading snapshots table
00466 bcachefs (vdb): reading snapshots done
00466 bcachefs (vdb): done starting filesystem
00466 starting copy
00515 BUG: sleeping function called from invalid context at mm/util.c:743
00515 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 120, name: kcompactd0
00515 preempt_count: 1, expected: 0
00515 RCU nest depth: 0, expected: 0
00515 1 lock held by kcompactd0/120:
00515  #0: ffffff80c0c558f0 (&mapping->i_private_lock){+.+.}-{3:3}, at: __buffer_migrate_folio+0x114/0x298
00515 Preemption disabled at:
00515 [<ffffffc08025fa84>] __buffer_migrate_folio+0x114/0x298
00515 CPU: 11 UID: 0 PID: 120 Comm: kcompactd0 Not tainted 6.15.0-rc3-ktest-gb2a78fdf7d2f #20530 PREEMPT 
00515 Hardware name: linux,dummy-virt (DT)
00515 Call trace:
00515  show_stack+0x1c/0x30 (C)
00515  dump_stack_lvl+0xb0/0xc0
00515  dump_stack+0x14/0x20
00515  __might_resched+0x180/0x288
00515  folio_mc_copy+0x54/0x98
00515  __migrate_folio.isra.0+0x68/0x168
00515  __buffer_migrate_folio+0x280/0x298
00515  buffer_migrate_folio_norefs+0x18/0x28
00515  migrate_pages_batch+0x94c/0xeb8
00515  migrate_pages_sync+0x84/0x240
00515  migrate_pages+0x284/0x698
00515  compact_zone+0xa40/0x10f8
00515  kcompactd_do_work+0x204/0x498
00515  kcompactd+0x3c4/0x400
00515  kthread+0x13c/0x208
00515  ret_from_fork+0x10/0x20
00518 starting sync
00519 starting rm
00520 ========= FAILED TIMEOUT lz4_buffered in 360s



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: scheduling while atomic on rc3 - migration + buffer heads
  2025-04-21 15:14 scheduling while atomic on rc3 - migration + buffer heads Kent Overstreet
@ 2025-04-21 15:47 ` Raghavendra K T
  2025-04-21 15:55   ` Kent Overstreet
  2025-04-21 17:27 ` Darrick J. Wong
  1 sibling, 1 reply; 4+ messages in thread
From: Raghavendra K T @ 2025-04-21 15:47 UTC (permalink / raw)
  To: Kent Overstreet, linux-mm, linux-ext4, linux-fsdevel; +Cc: wqu

On 4/21/2025 8:44 PM, Kent Overstreet wrote:

+Qu as I see similar report from him

> This just popped up in one of my test runs.
> 
> Given that it's buffer heads, it has to be the ext4 root filesystem, not
> bcachefs.
> 
> 00465 ========= TEST   lz4_buffered
> 00465
> 00465 WATCHDOG 360
> 00466 bcachefs (vdb): starting version 1.25: extent_flags opts=errors=panic,compression=lz4
> 00466 bcachefs (vdb): initializing new filesystem
> 00466 bcachefs (vdb): going read-write
> 00466 bcachefs (vdb): marking superblocks
> 00466 bcachefs (vdb): initializing freespace
> 00466 bcachefs (vdb): done initializing freespace
> 00466 bcachefs (vdb): reading snapshots table
> 00466 bcachefs (vdb): reading snapshots done
> 00466 bcachefs (vdb): done starting filesystem
> 00466 starting copy
> 00515 BUG: sleeping function called from invalid context at mm/util.c:743
> 00515 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 120, name: kcompactd0
> 00515 preempt_count: 1, expected: 0
> 00515 RCU nest depth: 0, expected: 0
> 00515 1 lock held by kcompactd0/120:
> 00515  #0: ffffff80c0c558f0 (&mapping->i_private_lock){+.+.}-{3:3}, at: __buffer_migrate_folio+0x114/0x298
> 00515 Preemption disabled at:
> 00515 [<ffffffc08025fa84>] __buffer_migrate_folio+0x114/0x298
> 00515 CPU: 11 UID: 0 PID: 120 Comm: kcompactd0 Not tainted 6.15.0-rc3-ktest-gb2a78fdf7d2f #20530 PREEMPT
> 00515 Hardware name: linux,dummy-virt (DT)
> 00515 Call trace:
> 00515  show_stack+0x1c/0x30 (C)
> 00515  dump_stack_lvl+0xb0/0xc0
> 00515  dump_stack+0x14/0x20
> 00515  __might_resched+0x180/0x288
> 00515  folio_mc_copy+0x54/0x98
> 00515  __migrate_folio.isra.0+0x68/0x168
> 00515  __buffer_migrate_folio+0x280/0x298
> 00515  buffer_migrate_folio_norefs+0x18/0x28
> 00515  migrate_pages_batch+0x94c/0xeb8
> 00515  migrate_pages_sync+0x84/0x240
> 00515  migrate_pages+0x284/0x698
> 00515  compact_zone+0xa40/0x10f8
> 00515  kcompactd_do_work+0x204/0x498
> 00515  kcompactd+0x3c4/0x400
> 00515  kthread+0x13c/0x208
> 00515  ret_from_fork+0x10/0x20
> 00518 starting sync
> 00519 starting rm
> 00520 ========= FAILED TIMEOUT lz4_buffered in 360s
> 

I have also seen similar stack with folio_mc_copy() while testing
PTE A bit patches.

IIUC, it has something to do with cond_resched() called from
folio_mc_copy().

(Thomas (tglx) mentioned long back that cond_resched() does not have the
scope awareness), not sure where should the fix be done in these
cases..

(I mean caller of the migrate_folio should call with no spinlock held
but with mutex? )

Regards
- Raghu








^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: scheduling while atomic on rc3 - migration + buffer heads
  2025-04-21 15:47 ` Raghavendra K T
@ 2025-04-21 15:55   ` Kent Overstreet
  0 siblings, 0 replies; 4+ messages in thread
From: Kent Overstreet @ 2025-04-21 15:55 UTC (permalink / raw)
  To: Raghavendra K T; +Cc: linux-mm, linux-ext4, linux-fsdevel, wqu

On Mon, Apr 21, 2025 at 09:17:18PM +0530, Raghavendra K T wrote:
> On 4/21/2025 8:44 PM, Kent Overstreet wrote:
> 
> +Qu as I see similar report from him
> 
> > This just popped up in one of my test runs.
> > 
> > Given that it's buffer heads, it has to be the ext4 root filesystem, not
> > bcachefs.
> > 
> > 00465 ========= TEST   lz4_buffered
> > 00465
> > 00465 WATCHDOG 360
> > 00466 bcachefs (vdb): starting version 1.25: extent_flags opts=errors=panic,compression=lz4
> > 00466 bcachefs (vdb): initializing new filesystem
> > 00466 bcachefs (vdb): going read-write
> > 00466 bcachefs (vdb): marking superblocks
> > 00466 bcachefs (vdb): initializing freespace
> > 00466 bcachefs (vdb): done initializing freespace
> > 00466 bcachefs (vdb): reading snapshots table
> > 00466 bcachefs (vdb): reading snapshots done
> > 00466 bcachefs (vdb): done starting filesystem
> > 00466 starting copy
> > 00515 BUG: sleeping function called from invalid context at mm/util.c:743
> > 00515 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 120, name: kcompactd0
> > 00515 preempt_count: 1, expected: 0
> > 00515 RCU nest depth: 0, expected: 0
> > 00515 1 lock held by kcompactd0/120:
> > 00515  #0: ffffff80c0c558f0 (&mapping->i_private_lock){+.+.}-{3:3}, at: __buffer_migrate_folio+0x114/0x298
> > 00515 Preemption disabled at:
> > 00515 [<ffffffc08025fa84>] __buffer_migrate_folio+0x114/0x298
> > 00515 CPU: 11 UID: 0 PID: 120 Comm: kcompactd0 Not tainted 6.15.0-rc3-ktest-gb2a78fdf7d2f #20530 PREEMPT
> > 00515 Hardware name: linux,dummy-virt (DT)
> > 00515 Call trace:
> > 00515  show_stack+0x1c/0x30 (C)
> > 00515  dump_stack_lvl+0xb0/0xc0
> > 00515  dump_stack+0x14/0x20
> > 00515  __might_resched+0x180/0x288
> > 00515  folio_mc_copy+0x54/0x98
> > 00515  __migrate_folio.isra.0+0x68/0x168
> > 00515  __buffer_migrate_folio+0x280/0x298
> > 00515  buffer_migrate_folio_norefs+0x18/0x28
> > 00515  migrate_pages_batch+0x94c/0xeb8
> > 00515  migrate_pages_sync+0x84/0x240
> > 00515  migrate_pages+0x284/0x698
> > 00515  compact_zone+0xa40/0x10f8
> > 00515  kcompactd_do_work+0x204/0x498
> > 00515  kcompactd+0x3c4/0x400
> > 00515  kthread+0x13c/0x208
> > 00515  ret_from_fork+0x10/0x20
> > 00518 starting sync
> > 00519 starting rm
> > 00520 ========= FAILED TIMEOUT lz4_buffered in 360s
> > 
> 
> I have also seen similar stack with folio_mc_copy() while testing
> PTE A bit patches.
> 
> IIUC, it has something to do with cond_resched() called from
> folio_mc_copy().
> 
> (Thomas (tglx) mentioned long back that cond_resched() does not have the
> scope awareness), not sure where should the fix be done in these
> cases..

That's true, calling cond_resched() while a spinlock held is a bug.

> (I mean caller of the migrate_folio should call with no spinlock held
> but with mutex? )

Yes. migrate_folio() does large data copies, so we don't want all that
running in atomic context.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: scheduling while atomic on rc3 - migration + buffer heads
  2025-04-21 15:14 scheduling while atomic on rc3 - migration + buffer heads Kent Overstreet
  2025-04-21 15:47 ` Raghavendra K T
@ 2025-04-21 17:27 ` Darrick J. Wong
  1 sibling, 0 replies; 4+ messages in thread
From: Darrick J. Wong @ 2025-04-21 17:27 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-mm, linux-ext4, linux-fsdevel

On Mon, Apr 21, 2025 at 11:14:44AM -0400, Kent Overstreet wrote:
> This just popped up in one of my test runs.
> 
> Given that it's buffer heads, it has to be the ext4 root filesystem, not
> bcachefs.

Wrong.  udev calling libblkid reading the (mounted) bdev to figure out
there's a bcachefs filesystem will still create bufferheads, and
possibly very large ones.

willy's temporary workaround in
https://lore.kernel.org/linux-fsdevel/Z_VwF1MA-R7MgDVG@casper.infradead.org/

shuts all that up enough to move on to triaging the rest of the
bleeding.

--D

> 00465 ========= TEST   lz4_buffered
> 00465 
> 00465 WATCHDOG 360
> 00466 bcachefs (vdb): starting version 1.25: extent_flags opts=errors=panic,compression=lz4
> 00466 bcachefs (vdb): initializing new filesystem
> 00466 bcachefs (vdb): going read-write
> 00466 bcachefs (vdb): marking superblocks
> 00466 bcachefs (vdb): initializing freespace
> 00466 bcachefs (vdb): done initializing freespace
> 00466 bcachefs (vdb): reading snapshots table
> 00466 bcachefs (vdb): reading snapshots done
> 00466 bcachefs (vdb): done starting filesystem
> 00466 starting copy
> 00515 BUG: sleeping function called from invalid context at mm/util.c:743
> 00515 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 120, name: kcompactd0
> 00515 preempt_count: 1, expected: 0
> 00515 RCU nest depth: 0, expected: 0
> 00515 1 lock held by kcompactd0/120:
> 00515  #0: ffffff80c0c558f0 (&mapping->i_private_lock){+.+.}-{3:3}, at: __buffer_migrate_folio+0x114/0x298
> 00515 Preemption disabled at:
> 00515 [<ffffffc08025fa84>] __buffer_migrate_folio+0x114/0x298
> 00515 CPU: 11 UID: 0 PID: 120 Comm: kcompactd0 Not tainted 6.15.0-rc3-ktest-gb2a78fdf7d2f #20530 PREEMPT 
> 00515 Hardware name: linux,dummy-virt (DT)
> 00515 Call trace:
> 00515  show_stack+0x1c/0x30 (C)
> 00515  dump_stack_lvl+0xb0/0xc0
> 00515  dump_stack+0x14/0x20
> 00515  __might_resched+0x180/0x288
> 00515  folio_mc_copy+0x54/0x98
> 00515  __migrate_folio.isra.0+0x68/0x168
> 00515  __buffer_migrate_folio+0x280/0x298
> 00515  buffer_migrate_folio_norefs+0x18/0x28
> 00515  migrate_pages_batch+0x94c/0xeb8
> 00515  migrate_pages_sync+0x84/0x240
> 00515  migrate_pages+0x284/0x698
> 00515  compact_zone+0xa40/0x10f8
> 00515  kcompactd_do_work+0x204/0x498
> 00515  kcompactd+0x3c4/0x400
> 00515  kthread+0x13c/0x208
> 00515  ret_from_fork+0x10/0x20
> 00518 starting sync
> 00519 starting rm
> 00520 ========= FAILED TIMEOUT lz4_buffered in 360s
> 
> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-04-21 17:27 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-04-21 15:14 scheduling while atomic on rc3 - migration + buffer heads Kent Overstreet
2025-04-21 15:47 ` Raghavendra K T
2025-04-21 15:55   ` Kent Overstreet
2025-04-21 17:27 ` Darrick J. Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox