linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Re: [linux-next:master] [block/bdev]  3c20917120: BUG:sleeping_function_called_from_invalid_context_at_mm/util.c
       [not found]   ` <Z9kEdPLNT8SOyOQT@xsang-OptiPlex-9020>
@ 2025-03-18  8:15     ` Luis Chamberlain
  2025-03-18 14:37       ` Matthew Wilcox
  2025-03-20  1:24       ` Lai, Yi
  0 siblings, 2 replies; 31+ messages in thread
From: Luis Chamberlain @ 2025-03-18  8:15 UTC (permalink / raw)
  To: Oliver Sang, David Hildenbrand, Alistair Popple, linux-mm
  Cc: Christian Brauner, Hannes Reinecke, oe-lkp, lkp,
	Matthew Wilcox (Oracle),
	John Garry, linux-block, ltp, Pankaj Raghav, Daniel Gomez

On Tue, Mar 18, 2025 at 01:28:20PM +0800, Oliver Sang wrote:
> hi, Christian Brauner,
> 
> On Tue, Mar 11, 2025 at 01:10:43PM +0100, Christian Brauner wrote:
> > On Mon, Mar 10, 2025 at 03:43:49PM +0800, kernel test robot wrote:
> > > 
> > > 
> > > Hello,
> > > 
> > > kernel test robot noticed "BUG:sleeping_function_called_from_invalid_context_at_mm/util.c" on:
> > > 
> > > commit: 3c20917120ce61f2a123ca0810293872f4c6b5a4 ("block/bdev: enable large folio support for large logical block sizes")
> > > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> > 
> > Is this also already fixed by:
> > 
> > commit a64e5a596067 ("bdev: add back PAGE_SIZE block size validation for sb_set_blocksize()")
> > 
> > ?
> 
> sorry for late.
> 
> commit a64e5a596067 cannot fix the issue. one dmesg is attached FYI.
> 
> we also tried to check linux-next/master tip, but neither below one can boot
> successfully in our env which we need further check.
> 
> da920b7df70177 (tag: next-20250314, linux-next/master) Add linux-next specific files for 20250314
> 
> e94bd4ec45ac1 (tag: next-20250317, linux-next/master) Add linux-next specific files for 20250317
> 
> so we are not sure the status of latest linux-next/master.
> 
> if you want us to check other commit or other patches, please let us know. thanks!

I cannot reproduce the issue by running the LTP test manually in a loop
for a long time:

export LTP_RUNTIME_MUL=2

while true; do \
	./testcases/kernel/syscalls/close_range/close_range01; done

What's the failure rate of just running the test alone above?
Does it always fail on this system? Is this a deterministic failure
or does it have a lower failure rate?

I also can't see how the patch ("("block/bdev: enable large folio
support for large logical block sizes") would trigger this.

You could try this patch but ...

https://lore.kernel.org/all/20250312050028.1784117-1-mcgrof@kernel.org/

we decided this is not right and not needed, and if we have a buggy
block driver we can address that.

I just can't see how this LTP test actually doing anything funky with block
devices at all.

The associated sleeping while atomic warning is triggered during
compaction though:

[  218.143642][  T299] Architecture:                         x86_64
[  218.143659][  T299] 
[  218.427851][   T51] BUG: sleeping function called from invalid context at mm/util.c:901
[  218.435981][   T51] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 51, name: kcompactd0
[  218.444773][   T51] preempt_count: 1, expected: 0
[  218.449601][   T51] RCU nest depth: 0, expected: 0
[  218.454476][   T51] CPU: 2 UID: 0 PID: 51 Comm: kcompactd0 Tainted: G S                 6.14.0-rc1-00006-g3c20917120ce #1
[  218.454486][   T51] Tainted: [S]=CPU_OUT_OF_SPEC
[  218.454488][   T51] Hardware name: Hewlett-Packard HP Pro 3340 MT/17A1, BIOS 8.07 01/24/2013
[  218.454492][   T51] Call Trace:
[  218.454495][   T51]  <TASK>
[  218.454498][   T51]  dump_stack_lvl+0x4f/0x70
[  218.454508][   T51]  __might_resched+0x2c6/0x450
[  218.454517][   T51]  folio_mc_copy+0xca/0x1f0
[  218.454525][   T51]  ? _raw_spin_lock+0x81/0xe0
[  218.454532][   T51]  __migrate_folio+0x11a/0x2d0
[  218.454541][   T51]  __buffer_migrate_folio+0x558/0x660
[  218.454548][   T51]  move_to_new_folio+0xf5/0x410
[  218.454555][   T51]  migrate_folio_move+0x211/0x770
[  218.454562][   T51]  ? __pfx_compaction_free+0x10/0x10
[  218.454572][   T51]  ? __pfx_migrate_folio_move+0x10/0x10
[  218.454578][   T51]  ? compaction_alloc_noprof+0x441/0x720
[  218.454587][   T51]  ? __pfx_compaction_alloc+0x10/0x10
[  218.454594][   T51]  ? __pfx_compaction_free+0x10/0x10
[  218.454601][   T51]  ? __pfx_compaction_free+0x10/0x10
[  218.454607][   T51]  ? migrate_folio_unmap+0x329/0x890
[  218.454614][   T51]  migrate_pages_batch+0xddf/0x1810
[  218.454621][   T51]  ? __pfx_compaction_free+0x10/0x10
[  218.454631][   T51]  ? __pfx_migrate_pages_batch+0x10/0x10
[  218.454638][   T51]  ? cgroup_rstat_updated+0xf1/0x860
[  218.454648][   T51]  migrate_pages_sync+0x10c/0x8e0
[  218.454656][   T51]  ? __pfx_compaction_alloc+0x10/0x10
[  218.454662][   T51]  ? __pfx_compaction_free+0x10/0x10
[  218.454669][   T51]  ? lru_gen_del_folio+0x383/0x820
[  218.454677][   T51]  ? __pfx_migrate_pages_sync+0x10/0x10
[  218.454683][   T51]  ? set_pfnblock_flags_mask+0x179/0x220
[  218.454691][   T51]  ? __pfx_lru_gen_del_folio+0x10/0x10
[  218.454699][   T51]  ? __pfx_compaction_alloc+0x10/0x10
[  218.454705][   T51]  ? __pfx_compaction_free+0x10/0x10
[  218.454713][   T51]  migrate_pages+0x846/0xe30
[  218.454720][   T51]  ? __pfx_compaction_alloc+0x10/0x10
[  218.454726][   T51]  ? __pfx_compaction_free+0x10/0x10
[  218.454733][   T51]  ? __pfx_buffer_migrate_folio_norefs+0x10/0x10
[  218.454740][   T51]  ? __pfx_migrate_pages+0x10/0x10
[  218.454748][   T51]  ? isolate_migratepages+0x32d/0xbd0
[  218.454757][   T51]  compact_zone+0x9e1/0x1680
[  218.454767][   T51]  ? __pfx_compact_zone+0x10/0x10
[  218.454774][   T51]  ? _raw_spin_lock_irqsave+0x87/0xe0
[  218.454780][   T51]  ? __pfx__raw_spin_lock_irqsave+0x10/0x10
[  218.454788][   T51]  compact_node+0x159/0x250
[  218.454795][   T51]  ? __pfx_compact_node+0x10/0x10
[  218.454807][   T51]  ? __pfx_extfrag_for_order+0x10/0x10
[  218.454814][   T51]  ? __pfx_mutex_unlock+0x10/0x10
[  218.454822][   T51]  ? finish_wait+0xd1/0x280
[  218.454831][   T51]  kcompactd+0x582/0x960
[  218.454839][   T51]  ? __pfx_kcompactd+0x10/0x10
[  218.454846][   T51]  ? _raw_spin_lock_irqsave+0x87/0xe0
[  218.454852][   T51]  ? __pfx__raw_spin_lock_irqsave+0x10/0x10
[  218.454858][   T51]  ? __pfx_autoremove_wake_function+0x10/0x10
[  218.454867][   T51]  ? __kthread_parkme+0xba/0x1e0
[  218.454874][   T51]  ? __pfx_kcompactd+0x10/0x10
[  218.454880][   T51]  kthread+0x3a1/0x770
[  218.454887][   T51]  ? __pfx_kthread+0x10/0x10
[  218.454895][   T51]  ? __pfx_kthread+0x10/0x10
[  218.454902][   T51]  ret_from_fork+0x30/0x70
[  218.454910][   T51]  ? __pfx_kthread+0x10/0x10
[  218.454915][   T51]  ret_from_fork_asm+0x1a/0x30
[  218.454924][   T51]  </TASK>

So the only thing I can think of the patch which the patch can do is
push more large folios to be used and so compaction can be a secondary
effect which managed to trigger another mm issue. I know there was a
recent migration fix but I can't see the relationship at all either.

  Luis


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [linux-next:master] [block/bdev]  3c20917120: BUG:sleeping_function_called_from_invalid_context_at_mm/util.c
  2025-03-18  8:15     ` [linux-next:master] [block/bdev] 3c20917120: BUG:sleeping_function_called_from_invalid_context_at_mm/util.c Luis Chamberlain
@ 2025-03-18 14:37       ` Matthew Wilcox
  2025-03-18 23:17         ` Luis Chamberlain
  2025-03-20  1:24       ` Lai, Yi
  1 sibling, 1 reply; 31+ messages in thread
From: Matthew Wilcox @ 2025-03-18 14:37 UTC (permalink / raw)
  To: Luis Chamberlain, Jan Kara
  Cc: Oliver Sang, David Hildenbrand, Alistair Popple, linux-mm,
	Christian Brauner, Hannes Reinecke, oe-lkp, lkp, John Garry,
	linux-block, ltp, Pankaj Raghav, Daniel Gomez

On Tue, Mar 18, 2025 at 01:15:33AM -0700, Luis Chamberlain wrote:
> I also can't see how the patch ("("block/bdev: enable large folio
> support for large logical block sizes") would trigger this.

Easy enough to see by checking the backtrace.

> [  218.454517][   T51]  folio_mc_copy+0xca/0x1f0
> [  218.454532][   T51]  __migrate_folio+0x11a/0x2d0
> [  218.454541][   T51]  __buffer_migrate_folio+0x558/0x660

folio_mc_copy() calls cond_resched() for large folios only.
__buffer_migrate_folio() calls spin_lock(&mapping->i_private_lock)

so for folios without buffer heads attached, we never take the spinlock,
and for small folios we never call cond_resched().  It's only the
compaction path for large folios with buffer_heads attached that
calls cond_resched() while holding a spinlock.

Jan was the one who extended the spinlock to be held over the copy
in ebdf4de5642f so adding him for thoughts.



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [linux-next:master] [block/bdev]  3c20917120: BUG:sleeping_function_called_from_invalid_context_at_mm/util.c
  2025-03-18 14:37       ` Matthew Wilcox
@ 2025-03-18 23:17         ` Luis Chamberlain
  2025-03-19  2:58           ` Matthew Wilcox
  0 siblings, 1 reply; 31+ messages in thread
From: Luis Chamberlain @ 2025-03-18 23:17 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Jan Kara, Oliver Sang, David Hildenbrand, Alistair Popple,
	linux-mm, Christian Brauner, Hannes Reinecke, oe-lkp, lkp,
	John Garry, linux-block, ltp, Pankaj Raghav, Daniel Gomez

On Tue, Mar 18, 2025 at 02:37:29PM +0000, Matthew Wilcox wrote:
> On Tue, Mar 18, 2025 at 01:15:33AM -0700, Luis Chamberlain wrote:
> > I also can't see how the patch ("("block/bdev: enable large folio
> > support for large logical block sizes") would trigger this.
> 
> Easy enough to see by checking the backtrace.
> 
> > [  218.454517][   T51]  folio_mc_copy+0xca/0x1f0
> > [  218.454532][   T51]  __migrate_folio+0x11a/0x2d0
> > [  218.454541][   T51]  __buffer_migrate_folio+0x558/0x660
> 
> folio_mc_copy() calls cond_resched() for large folios only.
> __buffer_migrate_folio() calls spin_lock(&mapping->i_private_lock)
> 
> so for folios without buffer heads attached, we never take the spinlock,
> and for small folios we never call cond_resched().  It's only the
> compaction path for large folios with buffer_heads attached that
> calls cond_resched() while holding a spinlock.
> 
> Jan was the one who extended the spinlock to be held over the copy
> in ebdf4de5642f so adding him for thoughts.

Ah, then that LTP test isn't going to easily reproduce bugs around
compaction bug. To help proactively find compaction bugs more
deterministically we wrote generic/750 and indeed we can easily see
issues creep up with a SOAK_DURATION=9000 on ext4 on linux-next as of
yesterday next-20250317.

Mar 18 07:10:59 extra-ext4-defaults kernel: Linux version 6.14.0-rc7-next-20250317 (mcgrof@beef) (gcc (Debian 14.2.0-16) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44) #30 SMP PREEMPT_DYNAMIC Tue Mar 18 07:05:01 UTC 2025
Mar 18 07:10:59 extra-ext4-defaults kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-6.14.0-rc7-next-20250317 root=PARTUUID=503fa6f2-2d5b-4d7e-8cf8-3a811de326ce ro console=tty0 console=tty1 console=ttyS0,115200n8 console=ttyS0
Mar 18 07:10:59 extra-ext4-defaults kernel: BIOS-provided physical RAM map:

<-- etc -->

Mar 18 23:09:29 extra-ext4-defaults kernel: EXT4-fs (loop16): mounted filesystem dc4fc2d3-efb6-4c07-8e2d-e9cf1f9f9773 r/w with ordered data mode. Quota mode: none.
Mar 18 23:09:32 extra-ext4-defaults kernel: EXT4-fs (loop5): mounted filesystem 08064f5c-03f9-4176-a738-ca5df9f258de r/w with ordered data mode. Quota mode: none.
Mar 18 23:09:32 extra-ext4-defaults kernel: EXT4-fs (loop5): unmounting filesystem 08064f5c-03f9-4176-a738-ca5df9f258de.
Mar 18 23:09:32 extra-ext4-defaults kernel: EXT4-fs (loop16): unmounting filesystem dc4fc2d3-efb6-4c07-8e2d-e9cf1f9f9773.
Mar 18 23:09:32 extra-ext4-defaults kernel: EXT4-fs (loop16): mounted filesystem dc4fc2d3-efb6-4c07-8e2d-e9cf1f9f9773 r/w with ordered data mode. Quota mode: none.
Mar 18 23:09:32 extra-ext4-defaults unknown: run fstests generic/750 at 2025-03-18 23:09:32
Mar 18 23:09:33 extra-ext4-defaults kernel: EXT4-fs (loop5): mounted filesystem bf5fcb06-8f03-4384-bd24-3a88418a08c3 r/w with ordered data mode. Quota mode: none.
Mar 18 23:10:21 extra-ext4-defaults kernel: BUG: unable to handle page fault for address: ffff9d5640010c48
Mar 18 23:10:21 extra-ext4-defaults kernel: #PF: supervisor read access in kernel mode
Mar 18 23:10:21 extra-ext4-defaults kernel: #PF: error_code(0x0000) - not-present page
Mar 18 23:10:21 extra-ext4-defaults kernel: PGD 38601067 P4D 38601067 PUD 0 
Mar 18 23:10:21 extra-ext4-defaults kernel: Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
Mar 18 23:10:21 extra-ext4-defaults kernel: CPU: 0 UID: 0 PID: 74 Comm: kcompactd0 Not tainted 6.14.0-rc7-next-20250317 #30
Mar 18 23:10:21 extra-ext4-defaults kernel: Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 2024.11-5 01/28/2025
Mar 18 23:10:21 extra-ext4-defaults kernel: RIP: 0010:__zone_watermark_ok+0x4e/0x1e0
Mar 18 23:10:21 extra-ext4-defaults kernel: Code: 00 00 00 41 f7 c0 38 02 00 00 0f 85 2c 01 00 00 48 8b 4f 30 48 63 d2 48 01 ca 85 db 0f 84 f3 00 00 00 49 29 d1 bb 80 00 00 00 <4c> 03 54 f7 38 31 d2 4d 39 ca 0f 8d d2 00 00 00 ba 01 00 00 00 85
Mar 18 23:10:21 extra-ext4-defaults kernel: RSP: 0018:ffffbf47c02b7c78 EFLAGS: 00010202
Mar 18 23:10:21 extra-ext4-defaults kernel: RAX: 0000000000000000 RBX: 0000000000000080 RCX: 0000000000000000
Mar 18 23:10:21 extra-ext4-defaults kernel: RDX: 0000000000000000 RSI: 0000000000002f52 RDI: ffff9d563fff9180
Mar 18 23:10:21 extra-ext4-defaults kernel: RBP: 0000000000000009 R08: 0000000000000080 R09: 00000000000030a1
Mar 18 23:10:21 extra-ext4-defaults kernel: R10: 0000000000000be4 R11: 0000000000000be4 R12: 0000000000000002
Mar 18 23:10:21 extra-ext4-defaults kernel: R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000002f52
Mar 18 23:10:21 extra-ext4-defaults kernel: FS:  0000000000000000(0000) GS:ffff9d56b6cce000(0000) knlGS:0000000000000000
Mar 18 23:10:21 extra-ext4-defaults kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 18 23:10:21 extra-ext4-defaults kernel: CR2: ffff9d5640010c48 CR3: 0000000115920006 CR4: 0000000000772ef0
Mar 18 23:10:21 extra-ext4-defaults kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Mar 18 23:10:21 extra-ext4-defaults kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Mar 18 23:10:21 extra-ext4-defaults kernel: PKRU: 55555554
Mar 18 23:10:21 extra-ext4-defaults kernel: Call Trace:
Mar 18 23:10:21 extra-ext4-defaults kernel:  <TASK>
Mar 18 23:10:21 extra-ext4-defaults kernel:  ? __die_body.cold+0x19/0x28
Mar 18 23:10:21 extra-ext4-defaults kernel:  ? page_fault_oops+0xa1/0x230
Mar 18 23:10:21 extra-ext4-defaults kernel:  ? search_module_extables+0x40/0x60
Mar 18 23:10:21 extra-ext4-defaults kernel:  ? __zone_watermark_ok+0x4e/0x1e0
Mar 18 23:10:21 extra-ext4-defaults kernel:  ? search_bpf_extables+0x5b/0x80
Mar 18 23:10:21 extra-ext4-defaults kernel:  ? exc_page_fault+0x16d/0x190
Mar 18 23:10:21 extra-ext4-defaults kernel:  ? asm_exc_page_fault+0x22/0x30
Mar 18 23:10:21 extra-ext4-defaults kernel:  ? __zone_watermark_ok+0x4e/0x1e0
Mar 18 23:10:21 extra-ext4-defaults kernel:  ? hrtimer_try_to_cancel+0x78/0x110
Mar 18 23:10:21 extra-ext4-defaults kernel:  compaction_suitable+0x4b/0xf0
Mar 18 23:10:21 extra-ext4-defaults kernel:  compaction_suit_allocation_order+0x8f/0x110
Mar 18 23:10:21 extra-ext4-defaults kernel:  kcompactd_do_work+0xbc/0x260
Mar 18 23:10:21 extra-ext4-defaults kernel:  kcompactd+0x396/0x3e0
Mar 18 23:10:21 extra-ext4-defaults kernel:  ? __pfx_autoremove_wake_function+0x10/0x10
Mar 18 23:10:21 extra-ext4-defaults kernel:  ? __pfx_kcompactd+0x10/0x10
Mar 18 23:10:21 extra-ext4-defaults kernel:  kthread+0xf6/0x240
Mar 18 23:10:21 extra-ext4-defaults kernel:  ? __pfx_kthread+0x10/0x10
Mar 18 23:10:21 extra-ext4-defaults kernel:  ? _raw_spin_unlock+0x15/0x30
Mar 18 23:10:21 extra-ext4-defaults kernel:  ? finish_task_switch.isra.0+0x94/0x290
Mar 18 23:10:21 extra-ext4-defaults kernel:  ? __pfx_kthread+0x10/0x10
Mar 18 23:10:21 extra-ext4-defaults kernel:  ret_from_fork+0x2d/0x50
Mar 18 23:10:21 extra-ext4-defaults kernel:  ? __pfx_kthread+0x10/0x10
Mar 18 23:10:21 extra-ext4-defaults kernel:  ret_from_fork_asm+0x1a/0x30
Mar 18 23:10:21 extra-ext4-defaults kernel:  </TASK>
Mar 18 23:10:21 extra-ext4-defaults kernel: Modules linked in: exfat xfs ext2 loop sunrpc 9p nls_iso8859_1 nls_cp437 crc32c_generic vfat fat kvm_intel kvm ghash_clmulni_intel sha512_ssse3 sha512_generic sha256_ssse3 sha1_ssse3 aesni_intel gf128mul crypto_simd cryptd 9pnet_virtio virtio_console virtio_balloon button joydev evdev serio_raw nvme_fabrics dm_mod nvme_core drm vsock_loopback vmw_vsock_virtio_transport_common vsock nfnetlink autofs4 ext4 crc16 mbcache jbd2 btrfs blake2b_generic efivarfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq raid1 raid0 md_mod virtio_net net_failover failover virtio_blk psmouse virtio_pci virtio_pci_legacy_dev virtio_pci_modern_dev virtio virtio_ring
Mar 18 23:10:21 extra-ext4-defaults kernel: CR2: ffff9d5640010c48
Mar 18 23:10:21 extra-ext4-defaults kernel: ---[ end trace 0000000000000000 ]---
Mar 18 23:10:21 extra-ext4-defaults kernel: RIP: 0010:__zone_watermark_ok+0x4e/0x1e0
Mar 18 23:10:21 extra-ext4-defaults kernel: Code: 00 00 00 41 f7 c0 38 02 00 00 0f 85 2c 01 00 00 48 8b 4f 30 48 63 d2 48 01 ca 85 db 0f 84 f3 00 00 00 49 29 d1 bb 80 00 00 00 <4c> 03 54 f7 38 31 d2 4d 39 ca 0f 8d d2 00 00 00 ba 01 00 00 00 85
Mar 18 23:10:21 extra-ext4-defaults kernel: RSP: 0018:ffffbf47c02b7c78 EFLAGS: 00010202
Mar 18 23:10:21 extra-ext4-defaults kernel: RAX: 0000000000000000 RBX: 0000000000000080 RCX: 0000000000000000
Mar 18 23:10:21 extra-ext4-defaults kernel: RDX: 0000000000000000 RSI: 0000000000002f52 RDI: ffff9d563fff9180
Mar 18 23:10:21 extra-ext4-defaults kernel: RBP: 0000000000000009 R08: 0000000000000080 R09: 00000000000030a1
Mar 18 23:10:21 extra-ext4-defaults kernel: R10: 0000000000000be4 R11: 0000000000000be4 R12: 0000000000000002
Mar 18 23:10:21 extra-ext4-defaults kernel: R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000002f52
Mar 18 23:10:21 extra-ext4-defaults kernel: FS:  0000000000000000(0000) GS:ffff9d56b6cce000(0000) knlGS:0000000000000000
Mar 18 23:10:21 extra-ext4-defaults kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 18 23:10:21 extra-ext4-defaults kernel: CR2: ffff9d5640010c48 CR3: 0000000115920006 CR4: 0000000000772ef0
Mar 18 23:10:21 extra-ext4-defaults kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Mar 18 23:10:21 extra-ext4-defaults kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Mar 18 23:10:21 extra-ext4-defaults kernel: PKRU: 55555554
Mar 18 23:10:21 extra-ext4-defaults kernel: note: kcompactd0[74] exited with irqs disabled


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [linux-next:master] [block/bdev]  3c20917120: BUG:sleeping_function_called_from_invalid_context_at_mm/util.c
  2025-03-18 23:17         ` Luis Chamberlain
@ 2025-03-19  2:58           ` Matthew Wilcox
  2025-03-19 16:55             ` Luis Chamberlain
  0 siblings, 1 reply; 31+ messages in thread
From: Matthew Wilcox @ 2025-03-19  2:58 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: Jan Kara, Oliver Sang, David Hildenbrand, Alistair Popple,
	linux-mm, Christian Brauner, Hannes Reinecke, oe-lkp, lkp,
	John Garry, linux-block, ltp, Pankaj Raghav, Daniel Gomez

On Tue, Mar 18, 2025 at 04:17:54PM -0700, Luis Chamberlain wrote:
> Ah, then that LTP test isn't going to easily reproduce bugs around
> compaction bug. To help proactively find compaction bugs more
> deterministically we wrote generic/750 and indeed we can easily see
> issues creep up with a SOAK_DURATION=9000 on ext4 on linux-next as of
> yesterday next-20250317.

Umm .. this is an entirely separate bug.  How much COMFIG_DEBUG do you
have enabled (ie is this a consequence of something that we have an
assert for, but you've disabled?)

> BUG: unable to handle page fault for address: ffff9d5640010c48
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> PGD 38601067 P4D 38601067 PUD 0 
> Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
> CPU: 0 UID: 0 PID: 74 Comm: kcompactd0 Not tainted 6.14.0-rc7-next-20250317 #30
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 2024.11-5 01/28/2025
> RIP: 0010:__zone_watermark_ok+0x4e/0x1e0
> Code: 00 00 00 41 f7 c0 38 02 00 00 0f 85 2c 01 00 00 48 8b 4f 30 48 63 d2 48 01 ca 85 db 0f 84 f3 00 00 00 49 29 d1 bb 80 00 00 00 <4c> 03 54 f7 38 31 d2 4d 39 ca 0f 8d d2 00 00 00 ba 01 00 00 00 85
> RSP: 0018:ffffbf47c02b7c78 EFLAGS: 00010202
> RAX: 0000000000000000 RBX: 0000000000000080 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: 0000000000002f52 RDI: ffff9d563fff9180
> RBP: 0000000000000009 R08: 0000000000000080 R09: 00000000000030a1
> R10: 0000000000000be4 R11: 0000000000000be4 R12: 0000000000000002
> R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000002f52

  2a:*	4c 03 54 f7 38       	add    0x38(%rdi,%rsi,8),%r10		<-- trapping instruction

Not quite sure what this is.  Perhaps running this through decode_stacktrace.sh
would be helpful?

> FS:  0000000000000000(0000) GS:ffff9d56b6cce000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: ffff9d5640010c48 CR3: 0000000115920006 CR4: 0000000000772ef0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> PKRU: 55555554
> Call Trace:
>  <TASK>
>  ? __die_body.cold+0x19/0x28
>  ? page_fault_oops+0xa1/0x230
>  ? search_module_extables+0x40/0x60
>  ? __zone_watermark_ok+0x4e/0x1e0
>  ? search_bpf_extables+0x5b/0x80
>  ? exc_page_fault+0x16d/0x190
>  ? __zone_watermark_ok+0x4e/0x1e0
>  ? hrtimer_try_to_cancel+0x78/0x110
>  compaction_suit_allocation_order+0x8f/0x110
>  kcompactd_do_work+0xbc/0x260
>  kcompactd+0x396/0x3e0
>  ? __pfx_autoremove_wake_function+0x10/0x10
>  ? __pfx_kcompactd+0x10/0x10
>  kthread+0xf6/0x240
>  ? __pfx_kthread+0x10/0x10
>  ? _raw_spin_unlock+0x15/0x30
>  ? finish_task_switch.isra.0+0x94/0x290
>  ? __pfx_kthread+0x10/0x10
>  ret_from_fork+0x2d/0x50
>  ? __pfx_kthread+0x10/0x10
>  ret_from_fork_asm+0x1a/0x30
>  </TASK>
> Modules linked in: exfat xfs ext2 loop sunrpc 9p nls_iso8859_1 nls_cp437 crc32c_generic vfat fat kvm_intel kvm ghash_clmulni_intel sha512_ssse3 sha512_generic sha256_ssse3 sha1_ssse3 aesni_intel gf128mul crypto_simd cryptd 9pnet_virtio virtio_console virtio_balloon button joydev evdev serio_raw nvme_fabrics dm_mod nvme_core drm vsock_loopback vmw_vsock_virtio_transport_common vsock nfnetlink autofs4 ext4 crc16 mbcache jbd2 btrfs blake2b_generic efivarfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq raid1 raid0 md_mod virtio_net net_failover failover virtio_blk psmouse virtio_pci virtio_pci_legacy_dev virtio_pci_modern_dev virtio virtio_ring
> CR2: ffff9d5640010c48
> ---[ end trace 0000000000000000 ]---
> RIP: 0010:__zone_watermark_ok+0x4e/0x1e0
> Code: 00 00 00 41 f7 c0 38 02 00 00 0f 85 2c 01 00 00 48 8b 4f 30 48 63 d2 48 01 ca 85 db 0f 84 f3 00 00 00 49 29 d1 bb 80 00 00 00 <4c> 03 54 f7 38 31 d2 4d 39 ca 0f 8d d2 00 00 00 ba 01 00 00 00 85
> RSP: 0018:ffffbf47c02b7c78 EFLAGS: 00010202
> RAX: 0000000000000000 RBX: 0000000000000080 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: 0000000000002f52 RDI: ffff9d563fff9180
> RBP: 0000000000000009 R08: 0000000000000080 R09: 00000000000030a1
> R10: 0000000000000be4 R11: 0000000000000be4 R12: 0000000000000002
> R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000002f52
> FS:  0000000000000000(0000) GS:ffff9d56b6cce000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: ffff9d5640010c48 CR3: 0000000115920006 CR4: 0000000000772ef0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> PKRU: 55555554
> note: kcompactd0[74] exited with irqs disabled


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [linux-next:master] [block/bdev]  3c20917120: BUG:sleeping_function_called_from_invalid_context_at_mm/util.c
  2025-03-19  2:58           ` Matthew Wilcox
@ 2025-03-19 16:55             ` Luis Chamberlain
  2025-03-19 19:16               ` Luis Chamberlain
  0 siblings, 1 reply; 31+ messages in thread
From: Luis Chamberlain @ 2025-03-19 16:55 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Jan Kara, Oliver Sang, David Hildenbrand, Alistair Popple,
	linux-mm, Christian Brauner, Hannes Reinecke, oe-lkp, lkp,
	John Garry, linux-block, ltp, Pankaj Raghav, Daniel Gomez,
	David Bueso

On Wed, Mar 19, 2025 at 02:58:38AM +0000, Matthew Wilcox wrote:
> On Tue, Mar 18, 2025 at 04:17:54PM -0700, Luis Chamberlain wrote:
> > Ah, then that LTP test isn't going to easily reproduce bugs around
> > compaction bug. To help proactively find compaction bugs more
> > deterministically we wrote generic/750 and indeed we can easily see
> > issues creep up with a SOAK_DURATION=9000 on ext4 on linux-next as of
> > yesterday next-20250317.
> 
> Umm .. this is an entirely separate bug.  How much COMFIG_DEBUG do you
> have enabled (ie is this a consequence of something that we have an
> assert for, but you've disabled?)

grep ^CONFIG_DEBUG .config
CONFIG_DEBUG_BUGVERBOSE=y
CONFIG_DEBUG_KERNEL=y
CONFIG_DEBUG_MISC=y
CONFIG_DEBUG_INFO=y
CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y
CONFIG_DEBUG_INFO_COMPRESSED_NONE=y
CONFIG_DEBUG_FS=y
CONFIG_DEBUG_FS_ALLOW_ALL=y
CONFIG_DEBUG_WX=y
CONFIG_DEBUG_KMEMLEAK=y
CONFIG_DEBUG_KMEMLEAK_MEM_POOL_SIZE=16000
CONFIG_DEBUG_VM_IRQSOFF=y
CONFIG_DEBUG_VM=y
CONFIG_DEBUG_VM_PGFLAGS=y
CONFIG_DEBUG_MEMORY_INIT=y
CONFIG_DEBUG_PREEMPT=y
CONFIG_DEBUG_LIST=y
CONFIG_DEBUG_MAPLE_TREE=y

Let me know if you want me to enable some other ones, these are always
enabled on any kdevops reportings.

> Not quite sure what this is.  Perhaps running this through decode_stacktrace.sh
> would be helpful?

Sure here is a fresh splat on next-20250317. What can be seen here is
that the issue can be easily reproduced within just one minute of the
test running.  FWIW, I'm not seeing this crash or any kernel splat within the
same time (I'll let this run the full 2.5 hours now to verify) on
vanilla 6.14.0-rc3 + the 64k-sector-size patches, which would explain why I
hadn't seen this in my earlier testing over 10 ext4 profiles on fstests. This
particular crash seems likely to be an artifact on the development cycle on
next-20250317.

Mar 19 16:20:41 extra-ext4-defaults kernel: Linux version 6.14.0-rc7-next-20250317 (mcgrof@beef) (gcc (Debian 14.2.0-16) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44) #32 SMP PREEMPT_DYNAMIC Wed Mar 19 16:18:39 UTC 2025
Mar 19 16:20:41 extra-ext4-defaults kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-6.14.0-rc7-next-20250317 root=PARTUUID=503fa6f2-2d5b-4d7e-8cf8-3a811de326ce ro console=tty0 console=tty1 console=ttyS0,115200n8 console=ttyS0

< etc >

Mar 19 16:21:23 extra-ext4-defaults kernel: EXT4-fs (loop16): mounted filesystem 200cf81b-dd0f-4614-8c4b-6f4af34aa9ff r/w with ordered data mode. Quota mode: none.
Mar 19 16:21:29 extra-ext4-defaults kernel: EXT4-fs (loop5): mounted filesystem cd905b7c-532b-4244-96b7-d2b393f3b16e r/w with ordered data mode. Quota mode: none.
Mar 19 16:21:29 extra-ext4-defaults kernel: EXT4-fs (loop5): unmounting filesystem cd905b7c-532b-4244-96b7-d2b393f3b16e.
Mar 19 16:21:29 extra-ext4-defaults kernel: EXT4-fs (loop16): unmounting filesystem 200cf81b-dd0f-4614-8c4b-6f4af34aa9ff.
Mar 19 16:21:29 extra-ext4-defaults kernel: EXT4-fs (loop16): mounted filesystem 200cf81b-dd0f-4614-8c4b-6f4af34aa9ff r/w with ordered data mode. Quota mode: none.
Mar 19 16:21:29 extra-ext4-defaults unknown: run fstests generic/750 at 2025-03-19 16:21:29
Mar 19 16:21:30 extra-ext4-defaults kernel: EXT4-fs (loop5): mounted filesystem f7af9558-57b0-4266-8326-a1bdda0be33a r/w with ordered data mode. Quota mode: none.
Mar 19 16:22:28 extra-ext4-defaults kernel: BUG: unable to handle page fault for address: ffff8f0e00013350
Mar 19 16:22:28 extra-ext4-defaults kernel: #PF: supervisor read access in kernel mode
Mar 19 16:22:28 extra-ext4-defaults kernel: #PF: error_code(0x0000) - not-present page
Mar 19 16:22:28 extra-ext4-defaults kernel: PGD 158401067 P4D 158401067 PUD 0
Mar 19 16:22:28 extra-ext4-defaults kernel: Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
Mar 19 16:22:28 extra-ext4-defaults kernel: CPU: 2 UID: 0 PID: 74 Comm: kcompactd0 Not tainted 6.14.0-rc7-next-20250317 #32
Mar 19 16:22:28 extra-ext4-defaults kernel: Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 2024.11-5 01/28/2025
Mar 19 16:22:28 extra-ext4-defaults kernel: RIP: 0010:__zone_watermark_ok (mm/page_alloc.c:3339) 
Mar 19 16:22:28 extra-ext4-defaults kernel: Code: 00 00 00 41 f7 c0 38 02 00 00 0f 85 2c 01 00 00 48 8b 4f 30 48 63 d2 48 01 ca 85 db 0f 84 f3 00 00 00 49 29 d1 bb 80 00 00 00 <4c> 03 54 f7 38 31 d2 4d 39 ca 0f 8d d2 00 00 00 ba 01 00 00 00 85
All code
========
   0:	00 00                	add    %al,(%rax)
   2:	00 41 f7             	add    %al,-0x9(%rcx)
   5:	c0 38 02             	sarb   $0x2,(%rax)
   8:	00 00                	add    %al,(%rax)
   a:	0f 85 2c 01 00 00    	jne    0x13c
  10:	48 8b 4f 30          	mov    0x30(%rdi),%rcx
  14:	48 63 d2             	movslq %edx,%rdx
  17:	48 01 ca             	add    %rcx,%rdx
  1a:	85 db                	test   %ebx,%ebx
  1c:	0f 84 f3 00 00 00    	je     0x115
  22:	49 29 d1             	sub    %rdx,%r9
  25:	bb 80 00 00 00       	mov    $0x80,%ebx
  2a:*	4c 03 54 f7 38       	add    0x38(%rdi,%rsi,8),%r10		<-- trapping instruction
  2f:	31 d2                	xor    %edx,%edx
  31:	4d 39 ca             	cmp    %r9,%r10
  34:	0f 8d d2 00 00 00    	jge    0x10c
  3a:	ba 01 00 00 00       	mov    $0x1,%edx
  3f:	85                   	.byte 0x85

Code starting with the faulting instruction
===========================================
   0:	4c 03 54 f7 38       	add    0x38(%rdi,%rsi,8),%r10
   5:	31 d2                	xor    %edx,%edx
   7:	4d 39 ca             	cmp    %r9,%r10
   a:	0f 8d d2 00 00 00    	jge    0xe2
  10:	ba 01 00 00 00       	mov    $0x1,%edx
  15:	85                   	.byte 0x85
Mar 19 16:22:28 extra-ext4-defaults kernel: RSP: 0018:ffffa3ed002b7c78 EFLAGS: 00010202
Mar 19 16:22:28 extra-ext4-defaults kernel: RAX: 0000000000000000 RBX: 0000000000000080 RCX: 0000000000000000
Mar 19 16:22:28 extra-ext4-defaults kernel: RDX: 0000000000000000 RSI: 0000000000003033 RDI: ffff8f0dffffb180
Mar 19 16:22:28 extra-ext4-defaults kernel: RBP: 0000000000000009 R08: 0000000000000080 R09: 0000000000002ffb
Mar 19 16:22:28 extra-ext4-defaults kernel: R10: 0000000000000c09 R11: 0000000000000c09 R12: 0000000000000002
Mar 19 16:22:28 extra-ext4-defaults kernel: R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000003033
Mar 19 16:22:28 extra-ext4-defaults kernel: FS:  0000000000000000(0000) GS:ffff8f0e72f4e000(0000) knlGS:0000000000000000
Mar 19 16:22:28 extra-ext4-defaults kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 19 16:22:28 extra-ext4-defaults kernel: CR2: ffff8f0e00013350 CR3: 0000000116942002 CR4: 0000000000772ef0
Mar 19 16:22:28 extra-ext4-defaults kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Mar 19 16:22:28 extra-ext4-defaults kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Mar 19 16:22:28 extra-ext4-defaults kernel: PKRU: 55555554
Mar 19 16:22:28 extra-ext4-defaults kernel: Call Trace:
Mar 19 16:22:28 extra-ext4-defaults kernel:  <TASK>
Mar 19 16:22:28 extra-ext4-defaults kernel: ? __die_body.cold (arch/x86/kernel/dumpstack.c:478 (discriminator 1) arch/x86/kernel/dumpstack.c:465 (discriminator 1) arch/x86/kernel/dumpstack.c:420 (discriminator 1)) 
Mar 19 16:22:28 extra-ext4-defaults kernel: ? page_fault_oops (arch/x86/mm/fault.c:710 (discriminator 1)) 
Mar 19 16:22:28 extra-ext4-defaults kernel: ? search_module_extables (kernel/module/main.c:3687) 
Mar 19 16:22:28 extra-ext4-defaults kernel: ? __zone_watermark_ok (mm/page_alloc.c:3339) 
Mar 19 16:22:28 extra-ext4-defaults kernel: ? search_bpf_extables (kernel/bpf/core.c:804) 
Mar 19 16:22:28 extra-ext4-defaults kernel: ? exc_page_fault (arch/x86/mm/fault.c:1182 (discriminator 1) arch/x86/mm/fault.c:1478 (discriminator 1) arch/x86/mm/fault.c:1538 (discriminator 1)) 
Mar 19 16:22:28 extra-ext4-defaults kernel: ? asm_exc_page_fault (./arch/x86/include/asm/idtentry.h:574) 
Mar 19 16:22:28 extra-ext4-defaults kernel: ? __zone_watermark_ok (mm/page_alloc.c:3339) 
Mar 19 16:22:28 extra-ext4-defaults kernel: compaction_suitable (mm/compaction.c:2454) 
Mar 19 16:22:28 extra-ext4-defaults kernel: compaction_suit_allocation_order (mm/compaction.c:2547) 
Mar 19 16:22:28 extra-ext4-defaults kernel: kcompactd_do_work (mm/compaction.c:3129) 
Mar 19 16:22:28 extra-ext4-defaults kernel: kcompactd (mm/compaction.c:3243) 
Mar 19 16:22:28 extra-ext4-defaults kernel: ? __pfx_autoremove_wake_function (kernel/sched/wait.c:383) 
Mar 19 16:22:28 extra-ext4-defaults kernel: ? __pfx_kcompactd (mm/compaction.c:3207) 
Mar 19 16:22:28 extra-ext4-defaults kernel: kthread (kernel/kthread.c:464) 
Mar 19 16:22:28 extra-ext4-defaults kernel: ? __pfx_kthread (kernel/kthread.c:413) 
Mar 19 16:22:28 extra-ext4-defaults kernel: ? _raw_spin_unlock (./include/linux/spinlock_api_smp.h:143 (discriminator 3) kernel/locking/spinlock.c:186 (discriminator 3)) 
Mar 19 16:22:28 extra-ext4-defaults kernel: ? finish_task_switch.isra.0 (./arch/x86/include/asm/paravirt.h:686 kernel/sched/sched.h:1533 kernel/sched/core.c:5125 kernel/sched/core.c:5243) 
Mar 19 16:22:28 extra-ext4-defaults kernel: ? __pfx_kthread (kernel/kthread.c:413) 
Mar 19 16:22:28 extra-ext4-defaults kernel: ret_from_fork (arch/x86/kernel/process.c:153) 
Mar 19 16:22:28 extra-ext4-defaults kernel: ? __pfx_kthread (kernel/kthread.c:413) 
Mar 19 16:22:28 extra-ext4-defaults kernel: ret_from_fork_asm (arch/x86/entry/entry_64.S:258) 
Mar 19 16:22:28 extra-ext4-defaults kernel:  </TASK>
Mar 19 16:22:28 extra-ext4-defaults kernel: Modules linked in: loop sunrpc 9p nls_iso8859_1 nls_cp437 crc32c_generic vfat fat kvm_intel kvm ghash_clmulni_intel sha512_ssse3 sha512_generic sha256_ssse3 sha1_ssse3 aesni_intel gf128mul crypto_simd virtio_balloon cryptd 9pnet_virtio virtio_console joydev evdev button serio_raw nvme_fabrics dm_mod nvme_core drm nfnetlink vsock_loopback vmw_vsock_virtio_transport_common vsock autofs4 ext4 crc16 mbcache jbd2 btrfs blake2b_generic efivarfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq raid1 raid0 md_mod virtio_net net_failover failover virtio_blk virtio_pci psmouse virtio_pci_legacy_dev virtio_pci_modern_dev virtio virtio_ring
Mar 19 16:22:28 extra-ext4-defaults kernel: CR2: ffff8f0e00013350
Mar 19 16:22:28 extra-ext4-defaults kernel: ---[ end trace 0000000000000000 ]---
Mar 19 16:22:28 extra-ext4-defaults kernel: RIP: 0010:__zone_watermark_ok (mm/page_alloc.c:3339) 
Mar 19 16:22:28 extra-ext4-defaults kernel: Code: 00 00 00 41 f7 c0 38 02 00 00 0f 85 2c 01 00 00 48 8b 4f 30 48 63 d2 48 01 ca 85 db 0f 84 f3 00 00 00 49 29 d1 bb 80 00 00 00 <4c> 03 54 f7 38 31 d2 4d 39 ca 0f 8d d2 00 00 00 ba 01 00 00 00 85
All code
========
   0:	00 00                	add    %al,(%rax)
   2:	00 41 f7             	add    %al,-0x9(%rcx)
   5:	c0 38 02             	sarb   $0x2,(%rax)
   8:	00 00                	add    %al,(%rax)
   a:	0f 85 2c 01 00 00    	jne    0x13c
  10:	48 8b 4f 30          	mov    0x30(%rdi),%rcx
  14:	48 63 d2             	movslq %edx,%rdx
  17:	48 01 ca             	add    %rcx,%rdx
  1a:	85 db                	test   %ebx,%ebx
  1c:	0f 84 f3 00 00 00    	je     0x115
  22:	49 29 d1             	sub    %rdx,%r9
  25:	bb 80 00 00 00       	mov    $0x80,%ebx
  2a:*	4c 03 54 f7 38       	add    0x38(%rdi,%rsi,8),%r10		<-- trapping instruction
  2f:	31 d2                	xor    %edx,%edx
  31:	4d 39 ca             	cmp    %r9,%r10
  34:	0f 8d d2 00 00 00    	jge    0x10c
  3a:	ba 01 00 00 00       	mov    $0x1,%edx
  3f:	85                   	.byte 0x85

Code starting with the faulting instruction
===========================================
   0:	4c 03 54 f7 38       	add    0x38(%rdi,%rsi,8),%r10
   5:	31 d2                	xor    %edx,%edx
   7:	4d 39 ca             	cmp    %r9,%r10
   a:	0f 8d d2 00 00 00    	jge    0xe2
  10:	ba 01 00 00 00       	mov    $0x1,%edx
  15:	85                   	.byte 0x85
Mar 19 16:22:28 extra-ext4-defaults kernel: RSP: 0018:ffffa3ed002b7c78 EFLAGS: 00010202
Mar 19 16:22:28 extra-ext4-defaults kernel: RAX: 0000000000000000 RBX: 0000000000000080 RCX: 0000000000000000
Mar 19 16:22:28 extra-ext4-defaults kernel: RDX: 0000000000000000 RSI: 0000000000003033 RDI: ffff8f0dffffb180
Mar 19 16:22:28 extra-ext4-defaults kernel: RBP: 0000000000000009 R08: 0000000000000080 R09: 0000000000002ffb
Mar 19 16:22:28 extra-ext4-defaults kernel: R10: 0000000000000c09 R11: 0000000000000c09 R12: 0000000000000002
Mar 19 16:22:28 extra-ext4-defaults kernel: R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000003033
Mar 19 16:22:28 extra-ext4-defaults kernel: FS:  0000000000000000(0000) GS:ffff8f0e72f4e000(0000) knlGS:0000000000000000
Mar 19 16:22:28 extra-ext4-defaults kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 19 16:22:28 extra-ext4-defaults kernel: CR2: ffff8f0e00013350 CR3: 0000000116942002 CR4: 0000000000772ef0
Mar 19 16:22:28 extra-ext4-defaults kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Mar 19 16:22:28 extra-ext4-defaults kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Mar 19 16:22:28 extra-ext4-defaults kernel: PKRU: 55555554
Mar 19 16:22:28 extra-ext4-defaults kernel: note: kcompactd0[74] exited with irqs disabled


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [linux-next:master] [block/bdev]  3c20917120: BUG:sleeping_function_called_from_invalid_context_at_mm/util.c
  2025-03-19 16:55             ` Luis Chamberlain
@ 2025-03-19 19:16               ` Luis Chamberlain
  2025-03-19 19:24                 ` Matthew Wilcox
  0 siblings, 1 reply; 31+ messages in thread
From: Luis Chamberlain @ 2025-03-19 19:16 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Jan Kara, Oliver Sang, David Hildenbrand, Alistair Popple,
	linux-mm, Christian Brauner, Hannes Reinecke, oe-lkp, lkp,
	John Garry, linux-block, ltp, Pankaj Raghav, Daniel Gomez,
	David Bueso

On Wed, Mar 19, 2025 at 09:55:11AM -0700, Luis Chamberlain wrote:
> FWIW, I'm not seeing this crash or any kernel splat within the
> same time (I'll let this run the full 2.5 hours now to verify) on
> vanilla 6.14.0-rc3 + the 64k-sector-size patches, which would explain why I
> hadn't seen this in my earlier testing over 10 ext4 profiles on fstests. This
> particular crash seems likely to be an artifact on the development cycle on
> next-20250317.

I confirm that with a vanilla 6.14.0-rc3 + the 64k-sector-size patches a 2.5
hour run generic/750 doesn't crash at all. So indeed something on the
development cycle leads to this particular crash.

  Luis


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [linux-next:master] [block/bdev]  3c20917120: BUG:sleeping_function_called_from_invalid_context_at_mm/util.c
  2025-03-19 19:16               ` Luis Chamberlain
@ 2025-03-19 19:24                 ` Matthew Wilcox
  2025-03-20 12:11                   ` Luis Chamberlain
  0 siblings, 1 reply; 31+ messages in thread
From: Matthew Wilcox @ 2025-03-19 19:24 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: Jan Kara, Oliver Sang, David Hildenbrand, Alistair Popple,
	linux-mm, Christian Brauner, Hannes Reinecke, oe-lkp, lkp,
	John Garry, linux-block, ltp, Pankaj Raghav, Daniel Gomez,
	David Bueso

On Wed, Mar 19, 2025 at 12:16:41PM -0700, Luis Chamberlain wrote:
> On Wed, Mar 19, 2025 at 09:55:11AM -0700, Luis Chamberlain wrote:
> > FWIW, I'm not seeing this crash or any kernel splat within the
> > same time (I'll let this run the full 2.5 hours now to verify) on
> > vanilla 6.14.0-rc3 + the 64k-sector-size patches, which would explain why I
> > hadn't seen this in my earlier testing over 10 ext4 profiles on fstests. This
> > particular crash seems likely to be an artifact on the development cycle on
> > next-20250317.
> 
> I confirm that with a vanilla 6.14.0-rc3 + the 64k-sector-size patches a 2.5
> hour run generic/750 doesn't crash at all. So indeed something on the
> development cycle leads to this particular crash.

We can't debug two problems at once.

FOr the first problem, I've demonstrated what the cause is, and that's
definitely introduced by your patch, so we need to figure out a
solution.

For the second problem, we don't know what it is.  Do you want to bisect
it to figure out which commit introduced it?


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [linux-next:master] [block/bdev]  3c20917120: BUG:sleeping_function_called_from_invalid_context_at_mm/util.c
  2025-03-18  8:15     ` [linux-next:master] [block/bdev] 3c20917120: BUG:sleeping_function_called_from_invalid_context_at_mm/util.c Luis Chamberlain
  2025-03-18 14:37       ` Matthew Wilcox
@ 2025-03-20  1:24       ` Lai, Yi
  1 sibling, 0 replies; 31+ messages in thread
From: Lai, Yi @ 2025-03-20  1:24 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: Oliver Sang, David Hildenbrand, Alistair Popple, linux-mm,
	Christian Brauner, Hannes Reinecke, oe-lkp, lkp,
	Matthew Wilcox (Oracle),
	John Garry, linux-block, ltp, Pankaj Raghav, Daniel Gomez,
	yi1.lai

On Tue, Mar 18, 2025 at 01:15:33AM -0700, Luis Chamberlain wrote:
> On Tue, Mar 18, 2025 at 01:28:20PM +0800, Oliver Sang wrote:
> > hi, Christian Brauner,
> > 
> > On Tue, Mar 11, 2025 at 01:10:43PM +0100, Christian Brauner wrote:
> > > On Mon, Mar 10, 2025 at 03:43:49PM +0800, kernel test robot wrote:
> > > > 
> > > > 
> > > > Hello,
> > > > 
> > > > kernel test robot noticed "BUG:sleeping_function_called_from_invalid_context_at_mm/util.c" on:
> > > > 
> > > > commit: 3c20917120ce61f2a123ca0810293872f4c6b5a4 ("block/bdev: enable large folio support for large logical block sizes")
> > > > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> > > 
> > > Is this also already fixed by:
> > > 
> > > commit a64e5a596067 ("bdev: add back PAGE_SIZE block size validation for sb_set_blocksize()")
> > > 
> > > ?
> > 
> > sorry for late.
> > 
> > commit a64e5a596067 cannot fix the issue. one dmesg is attached FYI.
> > 
> > we also tried to check linux-next/master tip, but neither below one can boot
> > successfully in our env which we need further check.
> > 
> > da920b7df70177 (tag: next-20250314, linux-next/master) Add linux-next specific files for 20250314
> > 
> > e94bd4ec45ac1 (tag: next-20250317, linux-next/master) Add linux-next specific files for 20250317
> > 
> > so we are not sure the status of latest linux-next/master.
> > 
> > if you want us to check other commit or other patches, please let us know. thanks!
> 
> I cannot reproduce the issue by running the LTP test manually in a loop
> for a long time:
> 
> export LTP_RUNTIME_MUL=2
> 
> while true; do \
> 	./testcases/kernel/syscalls/close_range/close_range01; done
> 
> What's the failure rate of just running the test alone above?
> Does it always fail on this system? Is this a deterministic failure
> or does it have a lower failure rate?
>
Hi Luis,

Greetings!

I used Syzkaller and found that this issue can also be reproduced using Syzkaller reproduction binary.

All detailed into can be found at:
https://github.com/laifryiee/syzkaller_logs/tree/main/250320_033346_folio_mc_copy
Syzkaller repro code:
https://github.com/laifryiee/syzkaller_logs/tree/main/250320_033346_folio_mc_copy/repro.c
Syzkaller repro syscall steps:
https://github.com/laifryiee/syzkaller_logs/tree/main/250320_033346_folio_mc_copy/repro.prog
Syzkaller report:
https://github.com/laifryiee/syzkaller_logs/tree/main/250320_033346_folio_mc_copy/repro.report
Kconfig(make olddefconfig):
https://github.com/laifryiee/syzkaller_logs/tree/main/250320_033346_folio_mc_copy/kconfig_origin
Bisect info:
https://github.com/laifryiee/syzkaller_logs/tree/main/250320_033346_folio_mc_copy/bisect_info.log
bzImage:
https://github.com/laifryiee/syzkaller_logs/raw/refs/heads/main/250320_033346_folio_mc_copy/bzImage_e94bd4ec45ac156616da285a0bf03056cd7430fc
Issue dmesg:
https://github.com/laifryiee/syzkaller_logs/blob/main/250320_033346_folio_mc_copy/e94bd4ec45ac156616da285a0bf03056cd7430fc_dmesg.log


After bisection and the first bad commit is:
"
3c20917120ce block/bdev: enable large folio support for large logical block sizes
"

"
[   23.399326]  dump_stack+0x19/0x20
[   23.399332]  __might_resched+0x37b/0x5a0
[   23.399345]  ? __kasan_check_read+0x15/0x20
[   23.399354]  folio_mc_copy+0x111/0x240
[   23.399368]  __migrate_folio.constprop.0+0x173/0x3c0
[   23.399377]  __buffer_migrate_folio+0x6a2/0x7b0
[   23.399389]  buffer_migrate_folio_norefs+0x3d/0x50
[   23.399398]  move_to_new_folio+0x153/0x5b0
[   23.399403]  ? __pfx_buffer_migrate_folio_norefs+0x10/0x10
[   23.399412]  migrate_pages_batch+0x19e0/0x2890
[   23.399424]  ? __pfx_compaction_free+0x10/0x10
[   23.399444]  ? __pfx_migrate_pages_batch+0x10/0x10
[   23.399450]  ? __kasan_check_read+0x15/0x20
[   23.399455]  ? __lock_acquire+0xdb6/0x5d60
[   23.399475]  ? __pfx___lock_acquire+0x10/0x10
[   23.399486]  migrate_pages+0x18de/0x2450
[   23.399500]  ? __pfx_compaction_free+0x10/0x10
[   23.399505]  ? __pfx_compaction_alloc+0x10/0x10
[   23.399514]  ? __pfx_migrate_pages+0x10/0x10
[   23.399519]  ? __this_cpu_preempt_check+0x21/0x30
[   23.399533]  ? rcu_is_watching+0x19/0xc0
[   23.399546]  ? isolate_migratepages_block+0x2253/0x41c0
[   23.399565]  ? __pfx_isolate_migratepages_block+0x10/0x10
[   23.399578]  compact_zone+0x1d66/0x4480
[   23.399600]  ? perf_trace_lock+0xe0/0x4f0
[   23.399612]  ? __pfx_compact_zone+0x10/0x10
[   23.399617]  ? __pfx_perf_trace_lock+0x10/0x10
[   23.399627]  ? __pfx_lock_acquire+0x10/0x10
[   23.399639]  compact_node+0x190/0x2c0
[   23.399647]  ? __pfx_compact_node+0x10/0x10
[   23.399653]  ? __pfx_lock_release+0x10/0x10
[   23.399678]  ? _raw_spin_unlock_irqrestore+0x45/0x70
[   23.399694]  kcompactd+0x784/0xde0
[   23.399705]  ? __pfx_kcompactd+0x10/0x10
[   23.399711]  ? lockdep_hardirqs_on+0x89/0x110
[   23.399721]  ? __pfx_autoremove_wake_function+0x10/0x10
[   23.399731]  ? __sanitizer_cov_trace_const_cmp1+0x1e/0x30
[   23.399742]  ? __kthread_parkme+0x15d/0x230
[   23.399753]  ? __pfx_kcompactd+0x10/0x10
[   23.399761]  kthread+0x444/0x980
[   23.399769]  ? __pfx_kthread+0x10/0x10
[   23.399776]  ? _raw_spin_unlock_irq+0x3c/0x60
[   23.399784]  ? __pfx_kthread+0x10/0x10
[   23.399792]  ret_from_fork+0x56/0x90
[   23.399802]  ? __pfx_kthread+0x10/0x10
[   23.399809]  ret_from_fork_asm+0x1a/0x30
[   23.399827]  </TASK>
"

Hope this cound be insightful to you.

Regards,
Yi Lai

---

If you don't need the following environment to reproduce the problem or if you
already have one reproduced environment, please ignore the following information.

How to reproduce:
git clone https://gitlab.com/xupengfe/repro_vm_env.git
cd repro_vm_env
tar -xvf repro_vm_env.tar.gz
cd repro_vm_env; ./start3.sh  // it needs qemu-system-x86_64 and I used v7.1.0
  // start3.sh will load bzImage_2241ab53cbb5cdb08a6b2d4688feb13971058f65 v6.2-rc5 kernel
  // You could change the bzImage_xxx as you want
  // Maybe you need to remove line "-drive if=pflash,format=raw,readonly=on,file=./OVMF_CODE.fd \" for different qemu version
You could use below command to log in, there is no password for root.
ssh -p 10023 root@localhost

After login vm(virtual machine) successfully, you could transfer reproduced
binary to the vm by below way, and reproduce the problem in vm:
gcc -pthread -o repro repro.c
scp -P 10023 repro root@localhost:/root/

Get the bzImage for target kernel:
Please use target kconfig and copy it to kernel_src/.config
make olddefconfig
make -jx bzImage           //x should equal or less than cpu num your pc has

Fill the bzImage file into above start3.sh to load the target kernel in vm.


Tips:
If you already have qemu-system-x86_64, please ignore below info.
If you want to install qemu v7.1.0 version:
git clone https://github.com/qemu/qemu.git
cd qemu
git checkout -f v7.1.0
mkdir build
cd build
yum install -y ninja-build.x86_64
yum -y install libslirp-devel.x86_64
../configure --target-list=x86_64-softmmu --enable-kvm --enable-vnc --enable-gtk --enable-sdl --enable-usb-redir --enable-slirp
make
make install 

> I also can't see how the patch ("("block/bdev: enable large folio
> support for large logical block sizes") would trigger this.
> 
> You could try this patch but ...
> 
> https://lore.kernel.org/all/20250312050028.1784117-1-mcgrof@kernel.org/
> 
> we decided this is not right and not needed, and if we have a buggy
> block driver we can address that.
> 
> I just can't see how this LTP test actually doing anything funky with block
> devices at all.
> 
> The associated sleeping while atomic warning is triggered during
> compaction though:
> 
> [  218.143642][  T299] Architecture:                         x86_64
> [  218.143659][  T299] 
> [  218.427851][   T51] BUG: sleeping function called from invalid context at mm/util.c:901
> [  218.435981][   T51] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 51, name: kcompactd0
> [  218.444773][   T51] preempt_count: 1, expected: 0
> [  218.449601][   T51] RCU nest depth: 0, expected: 0
> [  218.454476][   T51] CPU: 2 UID: 0 PID: 51 Comm: kcompactd0 Tainted: G S                 6.14.0-rc1-00006-g3c20917120ce #1
> [  218.454486][   T51] Tainted: [S]=CPU_OUT_OF_SPEC
> [  218.454488][   T51] Hardware name: Hewlett-Packard HP Pro 3340 MT/17A1, BIOS 8.07 01/24/2013
> [  218.454492][   T51] Call Trace:
> [  218.454495][   T51]  <TASK>
> [  218.454498][   T51]  dump_stack_lvl+0x4f/0x70
> [  218.454508][   T51]  __might_resched+0x2c6/0x450
> [  218.454517][   T51]  folio_mc_copy+0xca/0x1f0
> [  218.454525][   T51]  ? _raw_spin_lock+0x81/0xe0
> [  218.454532][   T51]  __migrate_folio+0x11a/0x2d0
> [  218.454541][   T51]  __buffer_migrate_folio+0x558/0x660
> [  218.454548][   T51]  move_to_new_folio+0xf5/0x410
> [  218.454555][   T51]  migrate_folio_move+0x211/0x770
> [  218.454562][   T51]  ? __pfx_compaction_free+0x10/0x10
> [  218.454572][   T51]  ? __pfx_migrate_folio_move+0x10/0x10
> [  218.454578][   T51]  ? compaction_alloc_noprof+0x441/0x720
> [  218.454587][   T51]  ? __pfx_compaction_alloc+0x10/0x10
> [  218.454594][   T51]  ? __pfx_compaction_free+0x10/0x10
> [  218.454601][   T51]  ? __pfx_compaction_free+0x10/0x10
> [  218.454607][   T51]  ? migrate_folio_unmap+0x329/0x890
> [  218.454614][   T51]  migrate_pages_batch+0xddf/0x1810
> [  218.454621][   T51]  ? __pfx_compaction_free+0x10/0x10
> [  218.454631][   T51]  ? __pfx_migrate_pages_batch+0x10/0x10
> [  218.454638][   T51]  ? cgroup_rstat_updated+0xf1/0x860
> [  218.454648][   T51]  migrate_pages_sync+0x10c/0x8e0
> [  218.454656][   T51]  ? __pfx_compaction_alloc+0x10/0x10
> [  218.454662][   T51]  ? __pfx_compaction_free+0x10/0x10
> [  218.454669][   T51]  ? lru_gen_del_folio+0x383/0x820
> [  218.454677][   T51]  ? __pfx_migrate_pages_sync+0x10/0x10
> [  218.454683][   T51]  ? set_pfnblock_flags_mask+0x179/0x220
> [  218.454691][   T51]  ? __pfx_lru_gen_del_folio+0x10/0x10
> [  218.454699][   T51]  ? __pfx_compaction_alloc+0x10/0x10
> [  218.454705][   T51]  ? __pfx_compaction_free+0x10/0x10
> [  218.454713][   T51]  migrate_pages+0x846/0xe30
> [  218.454720][   T51]  ? __pfx_compaction_alloc+0x10/0x10
> [  218.454726][   T51]  ? __pfx_compaction_free+0x10/0x10
> [  218.454733][   T51]  ? __pfx_buffer_migrate_folio_norefs+0x10/0x10
> [  218.454740][   T51]  ? __pfx_migrate_pages+0x10/0x10
> [  218.454748][   T51]  ? isolate_migratepages+0x32d/0xbd0
> [  218.454757][   T51]  compact_zone+0x9e1/0x1680
> [  218.454767][   T51]  ? __pfx_compact_zone+0x10/0x10
> [  218.454774][   T51]  ? _raw_spin_lock_irqsave+0x87/0xe0
> [  218.454780][   T51]  ? __pfx__raw_spin_lock_irqsave+0x10/0x10
> [  218.454788][   T51]  compact_node+0x159/0x250
> [  218.454795][   T51]  ? __pfx_compact_node+0x10/0x10
> [  218.454807][   T51]  ? __pfx_extfrag_for_order+0x10/0x10
> [  218.454814][   T51]  ? __pfx_mutex_unlock+0x10/0x10
> [  218.454822][   T51]  ? finish_wait+0xd1/0x280
> [  218.454831][   T51]  kcompactd+0x582/0x960
> [  218.454839][   T51]  ? __pfx_kcompactd+0x10/0x10
> [  218.454846][   T51]  ? _raw_spin_lock_irqsave+0x87/0xe0
> [  218.454852][   T51]  ? __pfx__raw_spin_lock_irqsave+0x10/0x10
> [  218.454858][   T51]  ? __pfx_autoremove_wake_function+0x10/0x10
> [  218.454867][   T51]  ? __kthread_parkme+0xba/0x1e0
> [  218.454874][   T51]  ? __pfx_kcompactd+0x10/0x10
> [  218.454880][   T51]  kthread+0x3a1/0x770
> [  218.454887][   T51]  ? __pfx_kthread+0x10/0x10
> [  218.454895][   T51]  ? __pfx_kthread+0x10/0x10
> [  218.454902][   T51]  ret_from_fork+0x30/0x70
> [  218.454910][   T51]  ? __pfx_kthread+0x10/0x10
> [  218.454915][   T51]  ret_from_fork_asm+0x1a/0x30
> [  218.454924][   T51]  </TASK>
> 
> So the only thing I can think of the patch which the patch can do is
> push more large folios to be used and so compaction can be a secondary
> effect which managed to trigger another mm issue. I know there was a
> recent migration fix but I can't see the relationship at all either.
> 
>   Luis


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [linux-next:master] [block/bdev]  3c20917120: BUG:sleeping_function_called_from_invalid_context_at_mm/util.c
  2025-03-19 19:24                 ` Matthew Wilcox
@ 2025-03-20 12:11                   ` Luis Chamberlain
  2025-03-20 12:18                     ` Luis Chamberlain
  2025-03-22 23:14                     ` Johannes Weiner
  0 siblings, 2 replies; 31+ messages in thread
From: Luis Chamberlain @ 2025-03-20 12:11 UTC (permalink / raw)
  To: Matthew Wilcox, Johannes Weiner
  Cc: Jan Kara, Oliver Sang, David Hildenbrand, Alistair Popple,
	linux-mm, Christian Brauner, Hannes Reinecke, oe-lkp, lkp,
	John Garry, linux-block, ltp, Pankaj Raghav, Daniel Gomez,
	David Bueso

On Wed, Mar 19, 2025 at 07:24:23PM +0000, Matthew Wilcox wrote:
> On Wed, Mar 19, 2025 at 12:16:41PM -0700, Luis Chamberlain wrote:
> > On Wed, Mar 19, 2025 at 09:55:11AM -0700, Luis Chamberlain wrote:
> > > FWIW, I'm not seeing this crash or any kernel splat within the
> > > same time (I'll let this run the full 2.5 hours now to verify) on
> > > vanilla 6.14.0-rc3 + the 64k-sector-size patches, which would explain why I
> > > hadn't seen this in my earlier testing over 10 ext4 profiles on fstests. This
> > > particular crash seems likely to be an artifact on the development cycle on
> > > next-20250317.
> > 
> > I confirm that with a vanilla 6.14.0-rc3 + the 64k-sector-size patches a 2.5
> > hour run generic/750 doesn't crash at all. So indeed something on the
> > development cycle leads to this particular crash.
> 
> We can't debug two problems at once.
> 
> FOr the first problem, I've demonstrated what the cause is, and that's
> definitely introduced by your patch, so we need to figure out a
> solution.

Sure, yeah I followed that.

> For the second problem, we don't know what it is.  Do you want to bisect
> it to figure out which commit introduced it?

Sure, the culprit is the patch titled:

mm: page_alloc: trace type pollution from compaction capturing

Johannes, any ideas? You can reproduce easily (1-2 minutes) by running
fstests against ext4 with a 4k block size filesystem on linux-next
against the test generic/750.

Below is the splat decoded.

Mar 20 11:52:55 extra-ext4-4k kernel: Linux version 6.14.0-rc6+ (mcgrof@beefy) (gcc (Debian 14.2.0-16) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44) #51 SMP PREEMPT_DYNAMIC Thu Mar 20 11:50:32 UTC 2025
Mar 20 11:52:55 extra-ext4-4k kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-6.14.0-rc6+ root=PARTUUID=503fa6f2-2d5b-4d7e-8cf8-3a811de326ce ro console=tty0 console=tty1 console=ttyS0,115200n8 console=ttyS0

< -- etc -->

Mar 20 11:55:27 extra-ext4-4k unknown: run fstests generic/750 at 2025-03-20 11:55:27
Mar 20 11:55:28 extra-ext4-4k kernel: EXT4-fs (loop5): mounted filesystem c20cbdee-a370-4743-80aa-95dec0beaaa2 r/w with ordered data mode. Quota mode: none.
Mar 20 11:56:29 extra-ext4-4k kernel: BUG: unable to handle page fault for address: ffff93098000ba00
Mar 20 11:56:29 extra-ext4-4k kernel: #PF: supervisor read access in kernel mode
Mar 20 11:56:29 extra-ext4-4k kernel: #PF: error_code(0x0000) - not-present page
Mar 20 11:56:29 extra-ext4-4k kernel: PGD 3a201067 P4D 3a201067 PUD 0
Mar 20 11:56:29 extra-ext4-4k kernel: Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
Mar 20 11:56:29 extra-ext4-4k kernel: CPU: 0 UID: 0 PID: 74 Comm: kcompactd0 Not tainted 6.14.0-rc6+ #51
Mar 20 11:56:29 extra-ext4-4k kernel: Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 2024.11-5 01/28/2025
Mar 20 11:56:29 extra-ext4-4k kernel: RIP: 0010:__zone_watermark_ok (mm/page_alloc.c:3256) 
Mar 20 11:56:29 extra-ext4-4k kernel: Code: 00 00 00 41 f7 c0 38 02 00 00 0f 85 2c 01 00 00 48 8b 4f 30 48 63 d2 48 01 ca 85 db 0f 84 f3 00 00 00 49 29 d1 bb 80 00 00 00 <4c> 03 54 f7 38 31 d2 4d 39 ca 0f 8d d2 00 00 00 ba 01 00 00 00 85
All code
========
   0:	00 00                	add    %al,(%rax)
   2:	00 41 f7             	add    %al,-0x9(%rcx)
   5:	c0 38 02             	sarb   $0x2,(%rax)
   8:	00 00                	add    %al,(%rax)
   a:	0f 85 2c 01 00 00    	jne    0x13c
  10:	48 8b 4f 30          	mov    0x30(%rdi),%rcx
  14:	48 63 d2             	movslq %edx,%rdx
  17:	48 01 ca             	add    %rcx,%rdx
  1a:	85 db                	test   %ebx,%ebx
  1c:	0f 84 f3 00 00 00    	je     0x115
  22:	49 29 d1             	sub    %rdx,%r9
  25:	bb 80 00 00 00       	mov    $0x80,%ebx
  2a:*	4c 03 54 f7 38       	add    0x38(%rdi,%rsi,8),%r10		<-- trapping instruction
  2f:	31 d2                	xor    %edx,%edx
  31:	4d 39 ca             	cmp    %r9,%r10
  34:	0f 8d d2 00 00 00    	jge    0x10c
  3a:	ba 01 00 00 00       	mov    $0x1,%edx
  3f:	85                   	.byte 0x85

Code starting with the faulting instruction
===========================================
   0:	4c 03 54 f7 38       	add    0x38(%rdi,%rsi,8),%r10
   5:	31 d2                	xor    %edx,%edx
   7:	4d 39 ca             	cmp    %r9,%r10
   a:	0f 8d d2 00 00 00    	jge    0xe2
  10:	ba 01 00 00 00       	mov    $0x1,%edx
  15:	85                   	.byte 0x85
Mar 20 11:56:29 extra-ext4-4k kernel: RSP: 0018:ffffa5bb002b7c78 EFLAGS: 00010206
Mar 20 11:56:29 extra-ext4-4k kernel: RAX: 0000000000000000 RBX: 0000000000000080 RCX: 0000000000000000
Mar 20 11:56:29 extra-ext4-4k kernel: RDX: 0000000000000000 RSI: 0000000000002431 RDI: ffff93097fff9840
Mar 20 11:56:29 extra-ext4-4k kernel: RBP: 0000000000000009 R08: 0000000000000080 R09: 0000000000005e90
Mar 20 11:56:29 extra-ext4-4k kernel: R10: 0000000000000c8e R11: 0000000000000c8e R12: 0000000000000000
Mar 20 11:56:29 extra-ext4-4k kernel: R13: 0000000000002431 R14: 0000000000000002 R15: 0000000000000000
Mar 20 11:56:29 extra-ext4-4k kernel: FS:  0000000000000000(0000) GS:ffff93097bc00000(0000) knlGS:0000000000000000
Mar 20 11:56:29 extra-ext4-4k kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 20 11:56:29 extra-ext4-4k kernel: CR2: ffff93098000ba00 CR3: 000000010c602004 CR4: 0000000000772ef0
Mar 20 11:56:29 extra-ext4-4k kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Mar 20 11:56:29 extra-ext4-4k kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Mar 20 11:56:29 extra-ext4-4k kernel: PKRU: 55555554
Mar 20 11:56:29 extra-ext4-4k kernel: Call Trace:
Mar 20 11:56:29 extra-ext4-4k kernel:  <TASK>
Mar 20 11:56:29 extra-ext4-4k kernel: ? __die_body.cold (arch/x86/kernel/dumpstack.c:478 (discriminator 1) arch/x86/kernel/dumpstack.c:465 (discriminator 1) arch/x86/kernel/dumpstack.c:420 (discriminator 1)) 
Mar 20 11:56:29 extra-ext4-4k kernel: ? page_fault_oops (arch/x86/mm/fault.c:710 (discriminator 1)) 
Mar 20 11:56:29 extra-ext4-4k kernel: ? search_module_extables (kernel/module/main.c:3733 (discriminator 3)) 
Mar 20 11:56:29 extra-ext4-4k kernel: ? __zone_watermark_ok (mm/page_alloc.c:3256) 
Mar 20 11:56:29 extra-ext4-4k kernel: ? search_bpf_extables (kernel/bpf/core.c:804) 
Mar 20 11:56:29 extra-ext4-4k kernel: ? exc_page_fault (arch/x86/mm/fault.c:1182 (discriminator 1) arch/x86/mm/fault.c:1478 (discriminator 1) arch/x86/mm/fault.c:1538 (discriminator 1)) 
Mar 20 11:56:29 extra-ext4-4k kernel: ? asm_exc_page_fault (./arch/x86/include/asm/idtentry.h:574) 
Mar 20 11:56:29 extra-ext4-4k kernel: ? __zone_watermark_ok (mm/page_alloc.c:3256) 
Mar 20 11:56:29 extra-ext4-4k kernel: ? asm_sysvec_call_function_single (./arch/x86/include/asm/idtentry.h:574) 
Mar 20 11:56:29 extra-ext4-4k kernel: compaction_suitable (mm/compaction.c:2438) 
Mar 20 11:56:29 extra-ext4-4k kernel: compaction_suit_allocation_order (mm/compaction.c:2525 (discriminator 1)) 
Mar 20 11:56:29 extra-ext4-4k kernel: kcompactd_do_work (mm/compaction.c:3106) 
Mar 20 11:56:29 extra-ext4-4k kernel: kcompactd (mm/compaction.c:3220) 
Mar 20 11:56:29 extra-ext4-4k kernel: ? __pfx_autoremove_wake_function (kernel/sched/wait.c:383) 
Mar 20 11:56:29 extra-ext4-4k kernel: ? __pfx_kcompactd (mm/compaction.c:3184) 
Mar 20 11:56:29 extra-ext4-4k kernel: kthread (kernel/kthread.c:464) 
Mar 20 11:56:29 extra-ext4-4k kernel: ? __pfx_kthread (kernel/kthread.c:413) 
Mar 20 11:56:29 extra-ext4-4k kernel: ? _raw_spin_unlock (./include/linux/spinlock_api_smp.h:143 (discriminator 3) kernel/locking/spinlock.c:186 (discriminator 3)) 
Mar 20 11:56:29 extra-ext4-4k kernel: ? finish_task_switch.isra.0 (./arch/x86/include/asm/paravirt.h:691 kernel/sched/sched.h:1533 kernel/sched/core.c:5132 kernel/sched/core.c:5250) 
Mar 20 11:56:29 extra-ext4-4k kernel: ? __pfx_kthread (kernel/kthread.c:413) 
Mar 20 11:56:29 extra-ext4-4k kernel: ret_from_fork (arch/x86/kernel/process.c:148) 
Mar 20 11:56:29 extra-ext4-4k kernel: ? __pfx_kthread (kernel/kthread.c:413) 
Mar 20 11:56:29 extra-ext4-4k kernel: ret_from_fork_asm (arch/x86/entry/entry_64.S:257) 
Mar 20 11:56:29 extra-ext4-4k kernel:  </TASK>
Mar 20 11:56:29 extra-ext4-4k kernel: Modules linked in: loop sunrpc 9p nls_iso8859_1 nls_cp437 vfat crc32c_generic fat kvm_intel kvm ghash_clmulni_intel sha512_ssse3 sha512_generic sha256_ssse3 sha1_ssse3 aesni_intel gf128mul crypto_simd 9pnet_virtio cryptd virtio_console virtio_balloon button evdev joydev serio_raw dm_mod nvme_fabrics drm nvme_core nfnetlink vsock_loopback vmw_vsock_virtio_transport_common vsock autofs4 ext4 crc16 mbcache jbd2 btrfs blake2b_generic efivarfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq raid1 raid0 md_mod virtio_net net_failover failover virtio_blk psmouse virtio_pci virtio_pci_legacy_dev virtio_pci_modern_dev virtio virtio_ring
Mar 20 11:56:29 extra-ext4-4k kernel: CR2: ffff93098000ba00
Mar 20 11:56:29 extra-ext4-4k kernel: ---[ end trace 0000000000000000 ]---
Mar 20 11:56:29 extra-ext4-4k kernel: RIP: 0010:__zone_watermark_ok (mm/page_alloc.c:3256) 
Mar 20 11:56:29 extra-ext4-4k kernel: Code: 00 00 00 41 f7 c0 38 02 00 00 0f 85 2c 01 00 00 48 8b 4f 30 48 63 d2 48 01 ca 85 db 0f 84 f3 00 00 00 49 29 d1 bb 80 00 00 00 <4c> 03 54 f7 38 31 d2 4d 39 ca 0f 8d d2 00 00 00 ba 01 00 00 00 85
All code
========
   0:	00 00                	add    %al,(%rax)
   2:	00 41 f7             	add    %al,-0x9(%rcx)
   5:	c0 38 02             	sarb   $0x2,(%rax)
   8:	00 00                	add    %al,(%rax)
   a:	0f 85 2c 01 00 00    	jne    0x13c
  10:	48 8b 4f 30          	mov    0x30(%rdi),%rcx
  14:	48 63 d2             	movslq %edx,%rdx
  17:	48 01 ca             	add    %rcx,%rdx
  1a:	85 db                	test   %ebx,%ebx
  1c:	0f 84 f3 00 00 00    	je     0x115
  22:	49 29 d1             	sub    %rdx,%r9
  25:	bb 80 00 00 00       	mov    $0x80,%ebx
  2a:*	4c 03 54 f7 38       	add    0x38(%rdi,%rsi,8),%r10		<-- trapping instruction
  2f:	31 d2                	xor    %edx,%edx
  31:	4d 39 ca             	cmp    %r9,%r10
  34:	0f 8d d2 00 00 00    	jge    0x10c
  3a:	ba 01 00 00 00       	mov    $0x1,%edx
  3f:	85                   	.byte 0x85

Code starting with the faulting instruction
===========================================
   0:	4c 03 54 f7 38       	add    0x38(%rdi,%rsi,8),%r10
   5:	31 d2                	xor    %edx,%edx
   7:	4d 39 ca             	cmp    %r9,%r10
   a:	0f 8d d2 00 00 00    	jge    0xe2
  10:	ba 01 00 00 00       	mov    $0x1,%edx
  15:	85                   	.byte 0x85
Mar 20 11:56:29 extra-ext4-4k kernel: RSP: 0018:ffffa5bb002b7c78 EFLAGS: 00010206
Mar 20 11:56:29 extra-ext4-4k kernel: RAX: 0000000000000000 RBX: 0000000000000080 RCX: 0000000000000000
Mar 20 11:56:29 extra-ext4-4k kernel: RDX: 0000000000000000 RSI: 0000000000002431 RDI: ffff93097fff9840
Mar 20 11:56:29 extra-ext4-4k kernel: RBP: 0000000000000009 R08: 0000000000000080 R09: 0000000000005e90
Mar 20 11:56:29 extra-ext4-4k kernel: R10: 0000000000000c8e R11: 0000000000000c8e R12: 0000000000000000
Mar 20 11:56:29 extra-ext4-4k kernel: R13: 0000000000002431 R14: 0000000000000002 R15: 0000000000000000
Mar 20 11:56:29 extra-ext4-4k kernel: FS:  0000000000000000(0000) GS:ffff93097bc00000(0000) knlGS:0000000000000000
Mar 20 11:56:29 extra-ext4-4k kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 20 11:56:29 extra-ext4-4k kernel: CR2: ffff93098000ba00 CR3: 000000010c602004 CR4: 0000000000772ef0
Mar 20 11:56:29 extra-ext4-4k kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Mar 20 11:56:29 extra-ext4-4k kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Mar 20 11:56:29 extra-ext4-4k kernel: PKRU: 55555554
Mar 20 11:56:29 extra-ext4-4k kernel: note: kcompactd0[74] exited with irqs disabled

  Luis


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [linux-next:master] [block/bdev]  3c20917120: BUG:sleeping_function_called_from_invalid_context_at_mm/util.c
  2025-03-20 12:11                   ` Luis Chamberlain
@ 2025-03-20 12:18                     ` Luis Chamberlain
  2025-03-22 23:14                     ` Johannes Weiner
  1 sibling, 0 replies; 31+ messages in thread
From: Luis Chamberlain @ 2025-03-20 12:18 UTC (permalink / raw)
  To: Matthew Wilcox, Johannes Weiner
  Cc: Jan Kara, Oliver Sang, David Hildenbrand, Alistair Popple,
	linux-mm, Christian Brauner, Hannes Reinecke, oe-lkp, lkp,
	John Garry, linux-block, ltp, Pankaj Raghav, Daniel Gomez,
	David Bueso

On Thu, Mar 20, 2025 at 05:11:21AM -0700, Luis Chamberlain wrote:
> Sure, the culprit is the patch titled:
> 
> mm: page_alloc: trace type pollution from compaction capturing

Sorry.. that's incorrect, the right title is:

mm: compaction: push watermark into compaction_suitable() callers

  Luis


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [linux-next:master] [block/bdev]  3c20917120: BUG:sleeping_function_called_from_invalid_context_at_mm/util.c
  2025-03-20 12:11                   ` Luis Chamberlain
  2025-03-20 12:18                     ` Luis Chamberlain
@ 2025-03-22 23:14                     ` Johannes Weiner
  2025-03-23  1:02                       ` Luis Chamberlain
  1 sibling, 1 reply; 31+ messages in thread
From: Johannes Weiner @ 2025-03-22 23:14 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: Matthew Wilcox, Jan Kara, Oliver Sang, David Hildenbrand,
	Alistair Popple, linux-mm, Christian Brauner, Hannes Reinecke,
	oe-lkp, lkp, John Garry, linux-block, ltp, Pankaj Raghav,
	Daniel Gomez, David Bueso

Hey Luis,

On Thu, Mar 20, 2025 at 05:11:19AM -0700, Luis Chamberlain wrote:
> On Wed, Mar 19, 2025 at 07:24:23PM +0000, Matthew Wilcox wrote:
> > On Wed, Mar 19, 2025 at 12:16:41PM -0700, Luis Chamberlain wrote:
> > > On Wed, Mar 19, 2025 at 09:55:11AM -0700, Luis Chamberlain wrote:
> > > > FWIW, I'm not seeing this crash or any kernel splat within the
> > > > same time (I'll let this run the full 2.5 hours now to verify) on
> > > > vanilla 6.14.0-rc3 + the 64k-sector-size patches, which would explain why I
> > > > hadn't seen this in my earlier testing over 10 ext4 profiles on fstests. This
> > > > particular crash seems likely to be an artifact on the development cycle on
> > > > next-20250317.
> > > 
> > > I confirm that with a vanilla 6.14.0-rc3 + the 64k-sector-size patches a 2.5
> > > hour run generic/750 doesn't crash at all. So indeed something on the
> > > development cycle leads to this particular crash.
> > 
> > We can't debug two problems at once.
> > 
> > FOr the first problem, I've demonstrated what the cause is, and that's
> > definitely introduced by your patch, so we need to figure out a
> > solution.
> 
> Sure, yeah I followed that.
> 
> > For the second problem, we don't know what it is.  Do you want to bisect
> > it to figure out which commit introduced it?
> 
> Sure, the culprit is the patch titled:
> 
> mm: page_alloc: trace type pollution from compaction capturing
> 
> Johannes, any ideas? You can reproduce easily (1-2 minutes) by running
> fstests against ext4 with a 4k block size filesystem on linux-next
> against the test generic/750.

Sorry for the late reply, I just saw your emails now.

> Below is the splat decoded.
> 
> Mar 20 11:52:55 extra-ext4-4k kernel: Linux version 6.14.0-rc6+ (mcgrof@beefy) (gcc (Debian 14.2.0-16) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44) #51 SMP PREEMPT_DYNAMIC Thu Mar 20 11:50:32 UTC 2025
> Mar 20 11:52:55 extra-ext4-4k kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-6.14.0-rc6+ root=PARTUUID=503fa6f2-2d5b-4d7e-8cf8-3a811de326ce ro console=tty0 console=tty1 console=ttyS0,115200n8 console=ttyS0
> 
> < -- etc -->
> 
> Mar 20 11:55:27 extra-ext4-4k unknown: run fstests generic/750 at 2025-03-20 11:55:27
> Mar 20 11:55:28 extra-ext4-4k kernel: EXT4-fs (loop5): mounted filesystem c20cbdee-a370-4743-80aa-95dec0beaaa2 r/w with ordered data mode. Quota mode: none.
> Mar 20 11:56:29 extra-ext4-4k kernel: BUG: unable to handle page fault for address: ffff93098000ba00
> Mar 20 11:56:29 extra-ext4-4k kernel: #PF: supervisor read access in kernel mode
> Mar 20 11:56:29 extra-ext4-4k kernel: #PF: error_code(0x0000) - not-present page
> Mar 20 11:56:29 extra-ext4-4k kernel: PGD 3a201067 P4D 3a201067 PUD 0
> Mar 20 11:56:29 extra-ext4-4k kernel: Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
> Mar 20 11:56:29 extra-ext4-4k kernel: CPU: 0 UID: 0 PID: 74 Comm: kcompactd0 Not tainted 6.14.0-rc6+ #51
> Mar 20 11:56:29 extra-ext4-4k kernel: Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 2024.11-5 01/28/2025
> Mar 20 11:56:29 extra-ext4-4k kernel: RIP: 0010:__zone_watermark_ok (mm/page_alloc.c:3256) 
> Mar 20 11:56:29 extra-ext4-4k kernel: Code: 00 00 00 41 f7 c0 38 02 00 00 0f 85 2c 01 00 00 48 8b 4f 30 48 63 d2 48 01 ca 85 db 0f 84 f3 00 00 00 49 29 d1 bb 80 00 00 00 <4c> 03 54 f7 38 31 d2 4d 39 ca 0f 8d d2 00 00 00 ba 01 00 00 00 85
> All code
> ========
>    0:	00 00                	add    %al,(%rax)
>    2:	00 41 f7             	add    %al,-0x9(%rcx)
>    5:	c0 38 02             	sarb   $0x2,(%rax)
>    8:	00 00                	add    %al,(%rax)
>    a:	0f 85 2c 01 00 00    	jne    0x13c
>   10:	48 8b 4f 30          	mov    0x30(%rdi),%rcx
>   14:	48 63 d2             	movslq %edx,%rdx
>   17:	48 01 ca             	add    %rcx,%rdx
>   1a:	85 db                	test   %ebx,%ebx
>   1c:	0f 84 f3 00 00 00    	je     0x115
>   22:	49 29 d1             	sub    %rdx,%r9
>   25:	bb 80 00 00 00       	mov    $0x80,%ebx
>   2a:*	4c 03 54 f7 38       	add    0x38(%rdi,%rsi,8),%r10		<-- trapping instruction

This looks like the same issue the bot reported here:

https://lore.kernel.org/all/20250321135524.GA1888695@cmpxchg.org/

There is a fix for it queued in next-20250318 and later. Could you
please double check with your reproducer against a more recent next?

Thanks


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [linux-next:master] [block/bdev]  3c20917120: BUG:sleeping_function_called_from_invalid_context_at_mm/util.c
  2025-03-22 23:14                     ` Johannes Weiner
@ 2025-03-23  1:02                       ` Luis Chamberlain
  2025-03-23  7:07                         ` Luis Chamberlain
  0 siblings, 1 reply; 31+ messages in thread
From: Luis Chamberlain @ 2025-03-23  1:02 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Matthew Wilcox, Jan Kara, Oliver Sang, David Hildenbrand,
	Alistair Popple, linux-mm, Christian Brauner, Hannes Reinecke,
	oe-lkp, lkp, John Garry, linux-block, ltp, Pankaj Raghav,
	Daniel Gomez, David Bueso

On Sat, Mar 22, 2025 at 07:14:40PM -0400, Johannes Weiner wrote:
> Hey Luis,
> 
> This looks like the same issue the bot reported here:
> 
> https://lore.kernel.org/all/20250321135524.GA1888695@cmpxchg.org/
> 
> There is a fix for it queued in next-20250318 and later. Could you
> please double check with your reproducer against a more recent next?

Confirmed, at least it's been 30 minutes and no crashes now where as
before it would crash in 1 minute. I'll let it soak for 2.5 hours in
the hopes I can trigger the warning originally reported by this thread.

Even though from code inspection I see how the kernel warning would
trigger I just want to force trigger it on a test, and I can't yet.

  Luis


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [linux-next:master] [block/bdev]  3c20917120: BUG:sleeping_function_called_from_invalid_context_at_mm/util.c
  2025-03-23  1:02                       ` Luis Chamberlain
@ 2025-03-23  7:07                         ` Luis Chamberlain
  2025-03-25  6:52                           ` Oliver Sang
  0 siblings, 1 reply; 31+ messages in thread
From: Luis Chamberlain @ 2025-03-23  7:07 UTC (permalink / raw)
  To: Johannes Weiner, Oliver Sang
  Cc: Matthew Wilcox, Jan Kara, David Hildenbrand, Alistair Popple,
	linux-mm, Christian Brauner, Hannes Reinecke, oe-lkp, lkp,
	John Garry, linux-block, ltp, Pankaj Raghav, Daniel Gomez,
	David Bueso

On Sat, Mar 22, 2025 at 06:02:13PM -0700, Luis Chamberlain wrote:
> On Sat, Mar 22, 2025 at 07:14:40PM -0400, Johannes Weiner wrote:
> > Hey Luis,
> > 
> > This looks like the same issue the bot reported here:
> > 
> > https://lore.kernel.org/all/20250321135524.GA1888695@cmpxchg.org/
> > 
> > There is a fix for it queued in next-20250318 and later. Could you
> > please double check with your reproducer against a more recent next?
> 
> Confirmed, at least it's been 30 minutes and no crashes now where as
> before it would crash in 1 minute. I'll let it soak for 2.5 hours in
> the hopes I can trigger the warning originally reported by this thread.
> 
> Even though from code inspection I see how the kernel warning would
> trigger I just want to force trigger it on a test, and I can't yet.

Survied 5 hours now. This certainly fixed that crash.

As for the kernel warning, I can't yet reproduce that, so trying to
run generic/750 forever and looping
./testcases/kernel/syscalls/close_range/close_range01
and yet nothing.

Oliver can you reproduce the kernel warning on next-20250321 ?

  Luis


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [linux-next:master] [block/bdev]  3c20917120: BUG:sleeping_function_called_from_invalid_context_at_mm/util.c
  2025-03-23  7:07                         ` Luis Chamberlain
@ 2025-03-25  6:52                           ` Oliver Sang
  2025-03-28  1:44                             ` Luis Chamberlain
  0 siblings, 1 reply; 31+ messages in thread
From: Oliver Sang @ 2025-03-25  6:52 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: Johannes Weiner, Matthew Wilcox, Jan Kara, David Hildenbrand,
	Alistair Popple, linux-mm, Christian Brauner, Hannes Reinecke,
	oe-lkp, lkp, John Garry, linux-block, ltp, Pankaj Raghav,
	Daniel Gomez, David Bueso, oliver.sang

[-- Attachment #1: Type: text/plain, Size: 6570 bytes --]

hi, Luis,

On Sun, Mar 23, 2025 at 12:07:27AM -0700, Luis Chamberlain wrote:
> On Sat, Mar 22, 2025 at 06:02:13PM -0700, Luis Chamberlain wrote:
> > On Sat, Mar 22, 2025 at 07:14:40PM -0400, Johannes Weiner wrote:
> > > Hey Luis,
> > > 
> > > This looks like the same issue the bot reported here:
> > > 
> > > https://lore.kernel.org/all/20250321135524.GA1888695@cmpxchg.org/
> > > 
> > > There is a fix for it queued in next-20250318 and later. Could you
> > > please double check with your reproducer against a more recent next?
> > 
> > Confirmed, at least it's been 30 minutes and no crashes now where as
> > before it would crash in 1 minute. I'll let it soak for 2.5 hours in
> > the hopes I can trigger the warning originally reported by this thread.
> > 
> > Even though from code inspection I see how the kernel warning would
> > trigger I just want to force trigger it on a test, and I can't yet.
> 
> Survied 5 hours now. This certainly fixed that crash.
> 
> As for the kernel warning, I can't yet reproduce that, so trying to
> run generic/750 forever and looping
> ./testcases/kernel/syscalls/close_range/close_range01
> and yet nothing.
> 
> Oliver can you reproduce the kernel warning on next-20250321 ?

the issue still exists on
9388ec571cb1ad (tag: next-20250321, linux-next/master) Add linux-next specific files for 20250321

but randomly (reproduced 7 times in 12 runs, then ltp.close_range01 also failed.
on another 5 times, the issue cannot be reproduced then ltp.close_range01 pass)

one dmesg is attached FYI.

kern  :err   : [  215.378500] BUG: sleeping function called from invalid context at mm/util.c:743
kern  :err   : [  215.386652] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 52, name: kcompactd0
kern  :err   : [  215.395438] preempt_count: 1, expected: 0
kern  :err   : [  215.400216] RCU nest depth: 0, expected: 0
kern  :warn  : [  215.405081] CPU: 0 UID: 0 PID: 52 Comm: kcompactd0 Tainted: G S                  6.14.0-rc7-next-20250321 #1 PREEMPT(voluntary) 
kern  :warn  : [  215.405095] Tainted: [S]=CPU_OUT_OF_SPEC
kern  :warn  : [  215.405097] Hardware name: Hewlett-Packard HP Pro 3340 MT/17A1, BIOS 8.07 01/24/2013
kern  :warn  : [  215.405101] Call Trace:
kern  :warn  : [  215.405104]  <TASK>
kern  :warn  : [  215.405107]  dump_stack_lvl+0x4f/0x70
kern  :warn  : [  215.405118]  __might_resched+0x2c6/0x450
kern  :warn  : [  215.405128]  folio_mc_copy+0xca/0x1f0
kern  :warn  : [  215.405137]  ? _raw_spin_lock+0x80/0xe0
kern  :warn  : [  215.405145]  __migrate_folio+0x117/0x2e0
kern  :warn  : [  215.405154]  __buffer_migrate_folio+0x563/0x670
kern  :warn  : [  215.405161]  move_to_new_folio+0xf5/0x410
kern  :warn  : [  215.405168]  migrate_folio_move+0x210/0x770
kern  :warn  : [  215.405173]  ? __pfx_compaction_free+0x10/0x10
kern  :warn  : [  215.405181]  ? __pfx_migrate_folio_move+0x10/0x10
kern  :warn  : [  215.405187]  ? compaction_alloc_noprof+0x441/0x720
kern  :warn  : [  215.405195]  ? __pfx_compaction_alloc+0x10/0x10
kern  :warn  : [  215.405202]  ? __pfx_compaction_free+0x10/0x10
kern  :warn  : [  215.405208]  ? __pfx_compaction_free+0x10/0x10
kern  :warn  : [  215.405213]  ? migrate_folio_unmap+0x329/0x890
kern  :warn  : [  215.405221]  migrate_pages_batch+0xe67/0x1800
kern  :warn  : [  215.405227]  ? __pfx_compaction_free+0x10/0x10
kern  :warn  : [  215.405236]  ? __pfx_migrate_pages_batch+0x10/0x10
kern  :warn  : [  215.405243]  ? pick_next_task_fair+0x304/0xba0
kern  :warn  : [  215.405253]  ? finish_task_switch+0x155/0x750
kern  :warn  : [  215.405260]  ? __switch_to+0x5ba/0x1020
kern  :warn  : [  215.405268]  migrate_pages_sync+0x10b/0x8e0
kern  :warn  : [  215.405275]  ? __pfx_compaction_alloc+0x10/0x10
kern  :warn  : [  215.405281]  ? __pfx_compaction_free+0x10/0x10
kern  :warn  : [  215.405289]  ? __pfx_migrate_pages_sync+0x10/0x10
kern  :warn  : [  215.405295]  ? set_pfnblock_flags_mask+0x178/0x220
kern  :warn  : [  215.405303]  ? __pfx_lru_gen_del_folio+0x10/0x10
kern  :warn  : [  215.405310]  ? __pfx_compaction_alloc+0x10/0x10
kern  :warn  : [  215.405316]  ? __pfx_compaction_free+0x10/0x10
kern  :warn  : [  215.405323]  migrate_pages+0x842/0xe30
kern  :warn  : [  215.405331]  ? __pfx_compaction_alloc+0x10/0x10
kern  :warn  : [  215.405337]  ? __pfx_compaction_free+0x10/0x10
kern  :warn  : [  215.405345]  ? __pfx_migrate_pages+0x10/0x10
kern  :warn  : [  215.405351]  ? __compact_finished+0x91b/0xbd0
kern  :warn  : [  215.405359]  ? isolate_migratepages+0x32d/0xbd0
kern  :warn  : [  215.405367]  compact_zone+0x9df/0x16c0
kern  :warn  : [  215.405377]  ? __pfx_compact_zone+0x10/0x10
kern  :warn  : [  215.405383]  ? _raw_spin_lock_irqsave+0x86/0xe0
kern  :warn  : [  215.405390]  ? __pfx__raw_spin_lock_irqsave+0x10/0x10
kern  :warn  : [  215.405397]  compact_node+0x158/0x250
kern  :warn  : [  215.405405]  ? __pfx_compact_node+0x10/0x10
kern  :warn  : [  215.405416]  ? __pfx_extfrag_for_order+0x10/0x10
kern  :warn  : [  215.405425]  ? __pfx_mutex_unlock+0x10/0x10
kern  :warn  : [  215.405432]  ? finish_wait+0xd1/0x280
kern  :warn  : [  215.405441]  kcompactd+0x5d0/0xa30
kern  :warn  : [  215.405450]  ? __pfx_kcompactd+0x10/0x10
kern  :warn  : [  215.405456]  ? _raw_spin_lock_irqsave+0x86/0xe0
kern  :warn  : [  215.405462]  ? __pfx__raw_spin_lock_irqsave+0x10/0x10
kern  :warn  : [  215.405469]  ? __pfx_autoremove_wake_function+0x10/0x10
kern  :warn  : [  215.405477]  ? __kthread_parkme+0xba/0x1e0
kern  :warn  : [  215.405485]  ? __pfx_kcompactd+0x10/0x10
kern  :warn  : [  215.405492]  kthread+0x3a0/0x770
kern  :warn  : [  215.405498]  ? __pfx_kthread+0x10/0x10
kern  :warn  : [  215.405504]  ? __pfx_kthread+0x10/0x10
kern  :warn  : [  215.405510]  ret_from_fork+0x30/0x70
kern  :warn  : [  215.405516]  ? __pfx_kthread+0x10/0x10
kern  :warn  : [  215.405521]  ret_from_fork_asm+0x1a/0x30
kern  :warn  : [  215.405530]  </TASK>
user  :notice: [  216.962224] Modules Loaded         netconsole btrfs blake2b_generic xor zstd_compress raid6_pq snd_hda_codec_realtek snd_hda_codec_generic snd_hda_scodec_component intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp i915 sd_mod sg kvm_intel intel_gtt ipmi_devintf ipmi_msghandler cec kvm drm_buddy snd_hda_intel snd_intel_dspcfg ttm snd_intel_sdw_acpi ghash_clmulni_intel drm_display_helper snd_hda_codec rapl drm_client_lib intel_cstate snd_hda_core drm_kms_helper ahci snd_hwdep libahci snd_pcm wmi_bmof mei_me video intel_uncore mei lpc_ich libata snd_timer pcspkr snd i2c_i801 i2c_smbus soundcore wmi binfmt_misc loop drm fuse dm_mod ip_tables


> 
>   Luis

[-- Attachment #2: kmsg.xz --]
[-- Type: application/x-xz, Size: 31488 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [linux-next:master] [block/bdev]  3c20917120: BUG:sleeping_function_called_from_invalid_context_at_mm/util.c
  2025-03-25  6:52                           ` Oliver Sang
@ 2025-03-28  1:44                             ` Luis Chamberlain
  2025-03-28  4:21                               ` Luis Chamberlain
  0 siblings, 1 reply; 31+ messages in thread
From: Luis Chamberlain @ 2025-03-28  1:44 UTC (permalink / raw)
  To: Oliver Sang
  Cc: Johannes Weiner, Matthew Wilcox, Jan Kara, David Hildenbrand,
	Alistair Popple, linux-mm, Christian Brauner, Hannes Reinecke,
	oe-lkp, lkp, John Garry, linux-block, ltp, Pankaj Raghav,
	Daniel Gomez, David Bueso

On Tue, Mar 25, 2025 at 02:52:49PM +0800, Oliver Sang wrote:
> hi, Luis,
> 
> On Sun, Mar 23, 2025 at 12:07:27AM -0700, Luis Chamberlain wrote:
> > On Sat, Mar 22, 2025 at 06:02:13PM -0700, Luis Chamberlain wrote:
> > > On Sat, Mar 22, 2025 at 07:14:40PM -0400, Johannes Weiner wrote:
> > > > Hey Luis,
> > > > 
> > > > This looks like the same issue the bot reported here:
> > > > 
> > > > https://lore.kernel.org/all/20250321135524.GA1888695@cmpxchg.org/
> > > > 
> > > > There is a fix for it queued in next-20250318 and later. Could you
> > > > please double check with your reproducer against a more recent next?
> > > 
> > > Confirmed, at least it's been 30 minutes and no crashes now where as
> > > before it would crash in 1 minute. I'll let it soak for 2.5 hours in
> > > the hopes I can trigger the warning originally reported by this thread.
> > > 
> > > Even though from code inspection I see how the kernel warning would
> > > trigger I just want to force trigger it on a test, and I can't yet.
> > 
> > Survied 5 hours now. This certainly fixed that crash.
> > 
> > As for the kernel warning, I can't yet reproduce that, so trying to
> > run generic/750 forever and looping
> > ./testcases/kernel/syscalls/close_range/close_range01
> > and yet nothing.
> > 
> > Oliver can you reproduce the kernel warning on next-20250321 ?
> 
> the issue still exists on
> 9388ec571cb1ad (tag: next-20250321, linux-next/master) Add linux-next specific files for 20250321
> 
> but randomly (reproduced 7 times in 12 runs, then ltp.close_range01 also failed.
> on another 5 times, the issue cannot be reproduced then ltp.close_range01 pass)

OK I narrowed down a reproducer to requiring the patch below 


diff --git a/mm/util.c b/mm/util.c
index 448117da071f..3585bdb8700a 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -735,6 +735,8 @@ int folio_mc_copy(struct folio *dst, struct folio *src)
 	long nr = folio_nr_pages(src);
 	long i = 0;
 
+	might_sleep();
+
 	for (;;) {
 		if (copy_mc_highpage(folio_page(dst, i), folio_page(src, i)))
 			return -EHWPOISON;


And  then just running:

dd if=/dev/zero of=/dev/vde bs=1024M count=1024

For some reason a kernel with the following didn't trigger it so the
above patch is needed


CONFIG_PROVE_LOCKING=y
CONFIG_DEBUG_SPINLOCK=y
CONFIG_ACPI_SLEEP=y

It may have to do with my preemtpion settings:

CONFIG_PREEMPT_BUILD=y
CONFIG_ARCH_HAS_PREEMPT_LAZY=y
# CONFIG_PREEMPT_NONE is not set
CONFIG_PREEMPT_VOLUNTARY=y
# CONFIG_PREEMPT is not set
# CONFIG_PREEMPT_LAZY is not set
CONFIG_PREEMPT_COUNT=y
CONFIG_PREEMPTION=y
CONFIG_PREEMPT_DYNAMIC=y
CONFIG_PREEMPT_RCU=y

And so now to see how we should fix it.

  LUis




^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [linux-next:master] [block/bdev]  3c20917120: BUG:sleeping_function_called_from_invalid_context_at_mm/util.c
  2025-03-28  1:44                             ` Luis Chamberlain
@ 2025-03-28  4:21                               ` Luis Chamberlain
  2025-03-28  9:47                                 ` Luis Chamberlain
  0 siblings, 1 reply; 31+ messages in thread
From: Luis Chamberlain @ 2025-03-28  4:21 UTC (permalink / raw)
  To: Jan Kara, Kefeng Wang
  Cc: Johannes Weiner, Oliver Sang, Matthew Wilcox, David Hildenbrand,
	Alistair Popple, linux-mm, Christian Brauner, Hannes Reinecke,
	oe-lkp, lkp, John Garry, linux-block, ltp, Pankaj Raghav,
	Daniel Gomez, David Bueso

On Thu, Mar 27, 2025 at 06:44:56PM -0700, Luis Chamberlain wrote:
> On Tue, Mar 25, 2025 at 02:52:49PM +0800, Oliver Sang wrote:
> > hi, Luis,
> > 
> > On Sun, Mar 23, 2025 at 12:07:27AM -0700, Luis Chamberlain wrote:
> > > On Sat, Mar 22, 2025 at 06:02:13PM -0700, Luis Chamberlain wrote:
> > > > On Sat, Mar 22, 2025 at 07:14:40PM -0400, Johannes Weiner wrote:
> > > > > Hey Luis,
> > > > > 
> > > > > This looks like the same issue the bot reported here:
> > > > > 
> > > > > https://lore.kernel.org/all/20250321135524.GA1888695@cmpxchg.org/
> > > > > 
> > > > > There is a fix for it queued in next-20250318 and later. Could you
> > > > > please double check with your reproducer against a more recent next?
> > > > 
> > > > Confirmed, at least it's been 30 minutes and no crashes now where as
> > > > before it would crash in 1 minute. I'll let it soak for 2.5 hours in
> > > > the hopes I can trigger the warning originally reported by this thread.
> > > > 
> > > > Even though from code inspection I see how the kernel warning would
> > > > trigger I just want to force trigger it on a test, and I can't yet.
> > > 
> > > Survied 5 hours now. This certainly fixed that crash.
> > > 
> > > As for the kernel warning, I can't yet reproduce that, so trying to
> > > run generic/750 forever and looping
> > > ./testcases/kernel/syscalls/close_range/close_range01
> > > and yet nothing.
> > > 
> > > Oliver can you reproduce the kernel warning on next-20250321 ?
> > 
> > the issue still exists on
> > 9388ec571cb1ad (tag: next-20250321, linux-next/master) Add linux-next specific files for 20250321
> > 
> > but randomly (reproduced 7 times in 12 runs, then ltp.close_range01 also failed.
>a> on another 5 times, the issue cannot be reproduced then ltp.close_range01 pass)
> 
> OK I narrowed down a reproducer to requiring the patch below 
> 
> 
> diff --git a/mm/util.c b/mm/util.c
> index 448117da071f..3585bdb8700a 100644
> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -735,6 +735,8 @@ int folio_mc_copy(struct folio *dst, struct folio *src)
>  	long nr = folio_nr_pages(src);
>  	long i = 0;
>  
> +	might_sleep();
> +
>  	for (;;) {
>  		if (copy_mc_highpage(folio_page(dst, i), folio_page(src, i)))
>  			return -EHWPOISON;
> 
> 
> And  then just running:
> 
> dd if=/dev/zero of=/dev/vde bs=1024M count=1024
> 
> For some reason a kernel with the following didn't trigger it so the
> above patch is needed
> 
> 
> CONFIG_PROVE_LOCKING=y
> CONFIG_DEBUG_SPINLOCK=y
> CONFIG_ACPI_SLEEP=y
> 
> It may have to do with my preemtpion settings:
> 
> CONFIG_PREEMPT_BUILD=y
> CONFIG_ARCH_HAS_PREEMPT_LAZY=y
> # CONFIG_PREEMPT_NONE is not set
> CONFIG_PREEMPT_VOLUNTARY=y
> # CONFIG_PREEMPT is not set
> # CONFIG_PREEMPT_LAZY is not set
> CONFIG_PREEMPT_COUNT=y
> CONFIG_PREEMPTION=y
> CONFIG_PREEMPT_DYNAMIC=y
> CONFIG_PREEMPT_RCU=y
> 
> And so now to see how we should fix it.

Would the extra ref check added via commit 060913999d7a9e50 ("mm:
migrate: support poisoned recover from migrate folio") make the removal
of the spin lock safe now given all the buffers are locked from the
folio? This survives some basic sanity checks on my end with
generic/750 against ext4 and also filling a drive at the same time with
fio. I have a feeling is we are not sure, do we have a reproducer for
the issue reported through ebdf4de5642fb6 ("mm: migrate: fix reference
check race between __find_get_block() and migration")? I suspect the
answer is now. The only other thing I can think of at this tie is to add
the lru_cache_disabled() || cpu_is_isolated(smp_processor_id())) checks
on __find_get_block_slow() as we do in bh_lru_install() but I am not
sure if that suffices for the old races.

Thoughts?

diff --git a/mm/migrate.c b/mm/migrate.c
index 97f0edf0c032..6a5d125ecde9 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -859,12 +859,12 @@ static int __buffer_migrate_folio(struct address_space *mapping,
 			}
 			bh = bh->b_this_page;
 		} while (bh != head);
+		spin_unlock(&mapping->i_private_lock);
 		if (busy) {
 			if (invalidated) {
 				rc = -EAGAIN;
 				goto unlock_buffers;
 			}
-			spin_unlock(&mapping->i_private_lock);
 			invalidate_bh_lrus();
 			invalidated = true;
 			goto recheck_buffers;
@@ -882,8 +882,6 @@ static int __buffer_migrate_folio(struct address_space *mapping,
 	} while (bh != head);
 
 unlock_buffers:
-	if (check_refs)
-		spin_unlock(&mapping->i_private_lock);
 	bh = head;
 	do {
 		unlock_buffer(bh);
diff --git a/mm/util.c b/mm/util.c
index 448117da071f..3585bdb8700a 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -735,6 +735,8 @@ int folio_mc_copy(struct folio *dst, struct folio *src)
 	long nr = folio_nr_pages(src);
 	long i = 0;
 
+	might_sleep();
+
 	for (;;) {
 		if (copy_mc_highpage(folio_page(dst, i), folio_page(src, i)))
 			return -EHWPOISON;


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [linux-next:master] [block/bdev]  3c20917120: BUG:sleeping_function_called_from_invalid_context_at_mm/util.c
  2025-03-28  4:21                               ` Luis Chamberlain
@ 2025-03-28  9:47                                 ` Luis Chamberlain
  2025-03-28 19:09                                   ` Luis Chamberlain
  0 siblings, 1 reply; 31+ messages in thread
From: Luis Chamberlain @ 2025-03-28  9:47 UTC (permalink / raw)
  To: Jan Kara, Kefeng Wang
  Cc: Johannes Weiner, Oliver Sang, Matthew Wilcox, David Hildenbrand,
	Alistair Popple, linux-mm, Christian Brauner, Hannes Reinecke,
	oe-lkp, lkp, John Garry, linux-block, ltp, Pankaj Raghav,
	Daniel Gomez, David Bueso

On Thu, Mar 27, 2025 at 09:21:30PM -0700, Luis Chamberlain wrote:
> Would the extra ref check added via commit 060913999d7a9e50 ("mm:
> migrate: support poisoned recover from migrate folio") make the removal
> of the spin lock safe now given all the buffers are locked from the
> folio? This survives some basic sanity checks on my end with
> generic/750 against ext4 and also filling a drive at the same time with
> fio. I have a feeling is we are not sure, do we have a reproducer for
> the issue reported through ebdf4de5642fb6 ("mm: migrate: fix reference
> check race between __find_get_block() and migration")? I suspect the
> answer is no.

<-- snip -->

> diff --git a/mm/migrate.c b/mm/migrate.c
> index 97f0edf0c032..6a5d125ecde9 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -859,12 +859,12 @@ static int __buffer_migrate_folio(struct address_space *mapping,
>  			}
>  			bh = bh->b_this_page;
>  		} while (bh != head);
> +		spin_unlock(&mapping->i_private_lock);
>  		if (busy) {
>  			if (invalidated) {
>  				rc = -EAGAIN;
>  				goto unlock_buffers;
>  			}
> -			spin_unlock(&mapping->i_private_lock);
>  			invalidate_bh_lrus();
>  			invalidated = true;
>  			goto recheck_buffers;
> @@ -882,8 +882,6 @@ static int __buffer_migrate_folio(struct address_space *mapping,
>  	} while (bh != head);
>  
>  unlock_buffers:
> -	if (check_refs)
> -		spin_unlock(&mapping->i_private_lock);
>  	bh = head;
>  	do {
>  		unlock_buffer(bh);
> diff --git a/mm/util.c b/mm/util.c
> index 448117da071f..3585bdb8700a 100644
> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -735,6 +735,8 @@ int folio_mc_copy(struct folio *dst, struct folio *src)
>  	long nr = folio_nr_pages(src);
>  	long i = 0;
>  
> +	might_sleep();
> +
>  	for (;;) {
>  		if (copy_mc_highpage(folio_page(dst, i), folio_page(src, i)))
>  			return -EHWPOISON;

Nah, this ends up producing the following so I'm inclined at this point
to just rever the 64k 64k block size enablment until we get this figured
out because I can't think of an easy quick solution to this.


Mar 28 03:35:30 extra-ext4-4k kernel: Linux version 6.14.0-rc7-next-20250321-dirty (mcgrof@beef) (gcc (Debian 14.2.0-16) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44) #57 SMP PREEMPT_DYNAMIC Fri Mar 28 03:33:04 UTC 2025
Mar 28 03:35:30 extra-ext4-4k kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-6.14.0-rc7-next-20250321-dirty root=PARTUUID=503fa6f2-2d5b-4d7e-8cf8-3a811de326ce ro console=tty0 console=tty1 console=ttyS0,115200n8 console=ttyS0

<-- snip -->

Mar 28 03:36:32 extra-ext4-4k kernel: EXT4-fs (loop16): mounted filesystem 90cdb700-ad4a-4261-a1be-4f4627772317 r/w with ordered data mode. Quota mode: none.
Mar 28 03:36:37 extra-ext4-4k kernel: EXT4-fs (loop5): mounted filesystem fef0662d-01fc-483d-87ac-8e4ef2939de3 r/w with ordered data mode. Quota mode: none.
Mar 28 03:36:37 extra-ext4-4k kernel: EXT4-fs (loop5): unmounting filesystem fef0662d-01fc-483d-87ac-8e4ef2939de3.
Mar 28 03:36:37 extra-ext4-4k kernel: EXT4-fs (loop16): unmounting filesystem 90cdb700-ad4a-4261-a1be-4f4627772317.
Mar 28 03:36:37 extra-ext4-4k kernel: EXT4-fs (loop16): mounted filesystem 90cdb700-ad4a-4261-a1be-4f4627772317 r/w with ordered data mode. Quota mode: none.
Mar 28 03:36:37 extra-ext4-4k unknown: run fstests generic/750 at 2025-03-28 03:36:37
Mar 28 03:36:39 extra-ext4-4k kernel: EXT4-fs (loop5): mounted filesystem ed8a8fa0-0ea1-4820-aa26-366cd64a6e36 r/w with ordered data mode. Quota mode: none.
Mar 28 03:39:06 extra-ext4-4k kernel: rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { P7603 } 8 jiffies s: 565 root: 0x0/T
Mar 28 03:39:06 extra-ext4-4k kernel: rcu: blocking rcu_node structures (internal RCU debug):
Mar 28 03:59:47 extra-ext4-4k kernel: NOHZ tick-stop error: local softirq work is pending, handler #10!!!
Mar 28 04:24:47 extra-ext4-4k kernel: ------------[ cut here ]------------
Mar 28 04:24:47 extra-ext4-4k kernel: WARNING: CPU: 7 PID: 1790 at mm/slub.c:4756 free_large_kmalloc+0xc1/0x100
Mar 28 04:24:47 extra-ext4-4k kernel: Modules linked in: loop sunrpc 9p nls_iso8859_1 kvm_intel nls_cp437 vfat crc32c_generic fat kvm ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 aesni_intel gf128mul crypto_simd cryptd 9pnet_virtio virtio_balloon virtio_console evdev button joydev serio_raw nvme_fabrics nvme_core dm_mod drm nfnetlink vsock_loopback vmw_vsock_virtio_transport_common vsock autofs4 ext4 crc16 mbcache jbd2 btrfs blake2b_generic efivarfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq raid1 raid0 md_mod virtio_net net_failover failover virtio_blk virtio_pci virtio_pci_legacy_dev virtio_pci_modern_dev psmouse virtio virtio_ring
Mar 28 04:24:47 extra-ext4-4k kernel: CPU: 7 UID: 0 PID: 1790 Comm: fsstress Not tainted 6.14.0-rc7-next-20250321-dirty #57 PREEMPT(full) 
Mar 28 04:24:47 extra-ext4-4k kernel: Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 2024.11-5 01/28/2025
Mar 28 04:24:47 extra-ext4-4k kernel: RIP: 0010:free_large_kmalloc+0xc1/0x100
Mar 28 04:24:47 extra-ext4-4k kernel: Code: f8 00 00 00 75 24 0f 0b 80 3d de 57 3b 01 00 0f 84 4f 63 be ff bd 00 f0 ff ff eb 8e 48 c7 c6 10 03 27 90 e8 61 32 fa ff 0f 0b <0f> 0b 48 83 c4 08 48 89 df 48 c7 c6 18 db 31 90 5b 5d e9 48 32 fa
Mar 28 04:24:47 extra-ext4-4k kernel: RSP: 0018:ffffa95942a67ac8 EFLAGS: 00010202
Mar 28 04:24:47 extra-ext4-4k kernel: RAX: 00000000000000ff RBX: fffffc63c4219c40 RCX: 0000000000000001
Mar 28 04:24:47 extra-ext4-4k kernel: RDX: 0000000000000000 RSI: ffff978e08671000 RDI: fffffc63c4219c40
Mar 28 04:24:47 extra-ext4-4k kernel: RBP: 0000000000000000 R08: 0000000000000020 R09: fffffffffffffff0
Mar 28 04:24:47 extra-ext4-4k kernel: R10: 00000000000000a0 R11: 0000000000000004 R12: 0000000000000000
Mar 28 04:24:47 extra-ext4-4k kernel: R13: ffff978e08671000 R14: 0000000000000000 R15: ffff978d03bf1000
Mar 28 04:24:47 extra-ext4-4k kernel: FS:  00007fefc4670740(0000) GS:ffff978eecda0000(0000) knlGS:0000000000000000
Mar 28 04:24:47 extra-ext4-4k kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 28 04:24:47 extra-ext4-4k kernel: CR2: 00007fefc4872000 CR3: 0000000075fa6002 CR4: 0000000000772ef0
Mar 28 04:24:47 extra-ext4-4k kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Mar 28 04:24:47 extra-ext4-4k kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Mar 28 04:24:47 extra-ext4-4k kernel: PKRU: 55555554
Mar 28 04:24:47 extra-ext4-4k kernel: Call Trace:
Mar 28 04:24:47 extra-ext4-4k kernel:  <TASK>
Mar 28 04:24:47 extra-ext4-4k kernel:  ? __warn.cold+0xb7/0x14f
Mar 28 04:24:47 extra-ext4-4k kernel:  ? free_large_kmalloc+0xc1/0x100
Mar 28 04:24:47 extra-ext4-4k kernel:  ? report_bug+0xe6/0x170
Mar 28 04:24:47 extra-ext4-4k kernel:  ? free_large_kmalloc+0xc1/0x100
Mar 28 04:24:47 extra-ext4-4k kernel:  ? handle_bug+0x199/0x260
Mar 28 04:24:47 extra-ext4-4k kernel:  ? exc_invalid_op+0x13/0x60
Mar 28 04:24:47 extra-ext4-4k kernel:  ? asm_exc_invalid_op+0x16/0x20
Mar 28 04:24:47 extra-ext4-4k kernel:  ? free_large_kmalloc+0xc1/0x100
Mar 28 04:24:47 extra-ext4-4k kernel:  ext4_xattr_block_set+0x191/0x1200 [ext4]
Mar 28 04:24:47 extra-ext4-4k kernel:  ? xattr_find_entry+0x96/0x110 [ext4]
Mar 28 04:24:47 extra-ext4-4k kernel:  ext4_xattr_set_handle+0x572/0x630 [ext4]
Mar 28 04:24:47 extra-ext4-4k kernel:  ext4_xattr_set+0x7c/0x150 [ext4]
Mar 28 04:24:47 extra-ext4-4k kernel:  __vfs_removexattr+0x7c/0xb0
Mar 28 04:24:47 extra-ext4-4k kernel:  __vfs_removexattr_locked+0xb7/0x150
Mar 28 04:24:47 extra-ext4-4k kernel:  vfs_removexattr+0x58/0x100
Mar 28 04:24:47 extra-ext4-4k kernel:  path_removexattrat+0x17d/0x330
Mar 28 04:24:47 extra-ext4-4k kernel:  ? __do_sys_newfstatat+0x33/0x60
Mar 28 04:24:47 extra-ext4-4k kernel:  __x64_sys_removexattr+0x19/0x20
Mar 28 04:24:47 extra-ext4-4k kernel:  do_syscall_64+0x69/0x140
Mar 28 04:24:47 extra-ext4-4k kernel:  entry_SYSCALL_64_after_hwframe+0x76/0x7e
Mar 28 04:24:47 extra-ext4-4k kernel: RIP: 0033:0x7fefc4781037
Mar 28 04:24:47 extra-ext4-4k kernel: Code: f0 ff ff 73 01 c3 48 8b 0d be 8d 0d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 b8 c5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 91 8d 0d 00 f7 d8 64 89 01 48
Mar 28 04:24:47 extra-ext4-4k kernel: RSP: 002b:00007ffc2b5a5d48 EFLAGS: 00000206 ORIG_RAX: 00000000000000c5
Mar 28 04:24:47 extra-ext4-4k kernel: RAX: ffffffffffffffda RBX: 000000000002d937 RCX: 00007fefc4781037
Mar 28 04:24:47 extra-ext4-4k kernel: RDX: 0000000000000000 RSI: 00007ffc2b5a5d70 RDI: 0000563075ae5850
Mar 28 04:24:47 extra-ext4-4k kernel: RBP: 00007ffc2b5a5d70 R08: 0000000000000064 R09: 00000000ffffffff
Mar 28 04:24:47 extra-ext4-4k kernel: R10: 0000000000000000 R11: 0000000000000206 R12: 00000000000030d4
Mar 28 04:24:47 extra-ext4-4k kernel: R13: 8f5c28f5c28f5c29 R14: 00007ffc2b5a5e20 R15: 0000563064291ea0
Mar 28 04:24:47 extra-ext4-4k kernel:  </TASK>
Mar 28 04:24:47 extra-ext4-4k kernel: irq event stamp: 94586373
Mar 28 04:24:47 extra-ext4-4k kernel: hardirqs last  enabled at (94586383): [<ffffffff8f19ee1e>] __up_console_sem+0x5e/0x70
Mar 28 04:24:47 extra-ext4-4k kernel: hardirqs last disabled at (94586394): [<ffffffff8f19ee03>] __up_console_sem+0x43/0x70
Mar 28 04:24:47 extra-ext4-4k kernel: softirqs last  enabled at (94585948): [<ffffffff8f0ffa53>] __irq_exit_rcu+0xc3/0x120
Mar 28 04:24:47 extra-ext4-4k kernel: softirqs last disabled at (94585929): [<ffffffff8f0ffa53>] __irq_exit_rcu+0xc3/0x120
Mar 28 04:24:47 extra-ext4-4k kernel: ---[ end trace 0000000000000000 ]---
Mar 28 04:24:47 extra-ext4-4k kernel: page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x402a88 pfn:0x108671
Mar 28 04:24:47 extra-ext4-4k kernel: flags: 0x57fffc000000000(node=1|zone=2|lastcpupid=0x1ffff)
Mar 28 04:24:47 extra-ext4-4k kernel: raw: 057fffc000000000 dead000000000100 dead000000000122 0000000000000000
Mar 28 04:24:47 extra-ext4-4k kernel: raw: 0000000000402a88 0000000000000000 00000000ffffffff 0000000000000000
Mar 28 04:24:47 extra-ext4-4k kernel: page dumped because: Not a kmalloc allocation
Mar 28 04:50:41 extra-ext4-4k kernel: BUG: MAX_LOCKDEP_CHAIN_HLOCKS too low!
Mar 28 04:50:41 extra-ext4-4k kernel: turning off the locking correctness validator.
Mar 28 04:50:41 extra-ext4-4k kernel: CPU: 4 UID: 0 PID: 668 Comm: btrfs-transacti Tainted: G        W           6.14.0-rc7-next-20250321-dirty #57 PREEMPT(full) 
Mar 28 04:50:41 extra-ext4-4k kernel: Tainted: [W]=WARN
Mar 28 04:50:41 extra-ext4-4k kernel: Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 2024.11-5 01/28/2025
Mar 28 04:50:41 extra-ext4-4k kernel: Call Trace:
Mar 28 04:50:41 extra-ext4-4k kernel:  <TASK>
Mar 28 04:50:41 extra-ext4-4k kernel:  dump_stack_lvl+0x68/0x90
Mar 28 04:50:41 extra-ext4-4k kernel:  __lock_acquire+0x1eaf/0x2210
Mar 28 04:50:41 extra-ext4-4k kernel:  ? __lock_acquire+0xc77/0x2210
Mar 28 04:50:41 extra-ext4-4k kernel:  lock_acquire+0xd1/0x2e0
Mar 28 04:50:41 extra-ext4-4k kernel:  ? put_cpu_partial+0x5f/0x1d0
Mar 28 04:50:41 extra-ext4-4k kernel:  ? lock_acquire+0xe1/0x2e0
Mar 28 04:50:41 extra-ext4-4k kernel:  put_cpu_partial+0x68/0x1d0
Mar 28 04:50:41 extra-ext4-4k kernel:  ? put_cpu_partial+0x5f/0x1d0
Mar 28 04:50:41 extra-ext4-4k kernel:  get_partial_node.part.0+0xde/0x400
Mar 28 04:50:41 extra-ext4-4k kernel:  ___slab_alloc+0x361/0x13c0
Mar 28 04:50:41 extra-ext4-4k kernel:  ? __alloc_object+0x2f/0x240
Mar 28 04:50:41 extra-ext4-4k kernel:  ? mark_held_locks+0x40/0x70
Mar 28 04:50:41 extra-ext4-4k kernel:  ? ___slab_alloc+0x701/0x13c0
Mar 28 04:50:41 extra-ext4-4k kernel:  ? lockdep_hardirqs_on+0x78/0x100
Mar 28 04:50:41 extra-ext4-4k kernel:  ? __alloc_object+0x2f/0x240
Mar 28 04:50:41 extra-ext4-4k kernel:  ? __slab_alloc.isra.0+0x52/0xa0
Mar 28 04:50:41 extra-ext4-4k kernel:  __slab_alloc.isra.0+0x52/0xa0
Mar 28 04:50:41 extra-ext4-4k kernel:  ? __alloc_object+0x2f/0x240
Mar 28 04:50:41 extra-ext4-4k kernel:  kmem_cache_alloc_noprof+0x1e3/0x430
Mar 28 04:50:41 extra-ext4-4k kernel:  ? xas_alloc+0x9f/0xc0
Mar 28 04:50:41 extra-ext4-4k kernel:  __alloc_object+0x2f/0x240
Mar 28 04:50:41 extra-ext4-4k kernel:  __create_object+0x22/0x90
Mar 28 04:50:41 extra-ext4-4k kernel:  ? xas_alloc+0x9f/0xc0
Mar 28 04:50:41 extra-ext4-4k kernel:  kmem_cache_alloc_lru_noprof+0x337/0x430
Mar 28 04:50:41 extra-ext4-4k kernel:  ? __lock_acquire+0x45d/0x2210
Mar 28 04:50:41 extra-ext4-4k kernel:  ? stack_depot_save_flags+0x23/0x9d0
Mar 28 04:50:41 extra-ext4-4k kernel:  xas_alloc+0x9f/0xc0
Mar 28 04:50:41 extra-ext4-4k kernel:  xas_create+0x309/0x6f0
Mar 28 04:50:41 extra-ext4-4k kernel:  xas_store+0x54/0x700
Mar 28 04:50:41 extra-ext4-4k kernel:  __xa_cmpxchg+0xb9/0x140
Mar 28 04:50:41 extra-ext4-4k kernel:  add_delayed_ref+0x11d/0xa50 [btrfs]
Mar 28 04:50:41 extra-ext4-4k kernel:  btrfs_alloc_tree_block+0x3ea/0x5a0 [btrfs]
Mar 28 04:50:41 extra-ext4-4k kernel:  split_leaf+0x167/0x6d0 [btrfs]
Mar 28 04:50:41 extra-ext4-4k kernel:  setup_leaf_for_split+0x19f/0x200 [btrfs]
Mar 28 04:50:41 extra-ext4-4k kernel:  btrfs_split_item+0x21/0x50 [btrfs]
Mar 28 04:50:41 extra-ext4-4k kernel:  btrfs_del_csums+0x270/0x3a0 [btrfs]
Mar 28 04:50:41 extra-ext4-4k kernel:  ? btrfs_csum_root+0x83/0xb0 [btrfs]
Mar 28 04:50:41 extra-ext4-4k kernel:  __btrfs_free_extent.isra.0+0x5fb/0xcc0 [btrfs]
Mar 28 04:50:41 extra-ext4-4k kernel:  __btrfs_run_delayed_refs+0x51d/0xf40 [btrfs]
Mar 28 04:50:41 extra-ext4-4k kernel:  btrfs_run_delayed_refs+0x3d/0x110 [btrfs]
Mar 28 04:50:41 extra-ext4-4k kernel:  btrfs_commit_transaction+0x8f/0xee0 [btrfs]
Mar 28 04:50:41 extra-ext4-4k kernel:  ? btrfs_init_block_rsv+0x51/0x60 [btrfs]
Mar 28 04:50:41 extra-ext4-4k kernel:  ? start_transaction+0x22c/0xaa0 [btrfs]
Mar 28 04:50:41 extra-ext4-4k kernel:  transaction_kthread+0x152/0x1b0 [btrfs]
Mar 28 04:50:41 extra-ext4-4k kernel:  ? __pfx_transaction_kthread+0x10/0x10 [btrfs]
Mar 28 04:50:41 extra-ext4-4k kernel:  kthread+0x107/0x250
Mar 28 04:50:41 extra-ext4-4k kernel:  ? find_held_lock+0x2b/0x80
Mar 28 04:50:41 extra-ext4-4k kernel:  ? ret_from_fork+0x17/0x50
Mar 28 04:50:41 extra-ext4-4k kernel:  ? ret_from_fork+0x17/0x50
Mar 28 04:50:41 extra-ext4-4k kernel:  ? lock_release+0x17d/0x2c0
Mar 28 04:50:41 extra-ext4-4k kernel:  ? __pfx_kthread+0x10/0x10
Mar 28 04:50:41 extra-ext4-4k kernel:  ? __pfx_kthread+0x10/0x10
Mar 28 04:50:41 extra-ext4-4k kernel:  ret_from_fork+0x2d/0x50
Mar 28 04:50:41 extra-ext4-4k kernel:  ? __pfx_kthread+0x10/0x10
Mar 28 04:50:41 extra-ext4-4k kernel:  ret_from_fork_asm+0x1a/0x30
Mar 28 04:50:41 extra-ext4-4k kernel:  </TASK>
Mar 28 05:04:32 extra-ext4-4k kernel: page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x20889c pfn:0x4a3e
Mar 28 05:04:32 extra-ext4-4k kernel: flags: 0xffffc000000000(node=0|zone=1|lastcpupid=0x1ffff)
Mar 28 05:04:32 extra-ext4-4k kernel: raw: 00ffffc000000000 fffffc63c041d448 ffff978d7bc347f0 0000000000000000
Mar 28 05:04:32 extra-ext4-4k kernel: raw: 000000000020889c 0000000000000000 00000000ffffffff 0000000000000000
Mar 28 05:04:32 extra-ext4-4k kernel: page dumped because: Not a kmalloc allocation
Mar 28 05:31:13 extra-ext4-4k kernel: page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x498b96 pfn:0x76f4
Mar 28 05:31:13 extra-ext4-4k kernel: flags: 0xffffc000000000(node=0|zone=1|lastcpupid=0x1ffff)
Mar 28 05:31:13 extra-ext4-4k kernel: raw: 00ffffc000000000 fffffc63c01d9308 fffffc63c01df648 0000000000000000
Mar 28 05:31:13 extra-ext4-4k kernel: raw: 0000000000498b96 0000000000000000 00000000ffffffff 0000000000000000
Mar 28 05:31:13 extra-ext4-4k kernel: page dumped because: Not a kmalloc allocation
Mar 28 05:57:09 extra-ext4-4k kernel: EXT4-fs error (device loop5): ext4_get_first_dir_block:3538: inode #5174: comm fsstress: directory missing '.'
Mar 28 06:04:43 extra-ext4-4k kernel: EXT4-fs warning (device loop5): ext4_empty_dir:3088: inode #5176: comm fsstress: directory missing '.'
Mar 28 06:05:19 extra-ext4-4k kernel: page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x243117 pfn:0x104ddb
Mar 28 06:05:19 extra-ext4-4k kernel: flags: 0x57fffc000000000(node=1|zone=2|lastcpupid=0x1ffff)
Mar 28 06:05:19 extra-ext4-4k kernel: raw: 057fffc000000000 fffffc63c4136fc8 ffff978d7bcb4970 0000000000000000
Mar 28 06:05:19 extra-ext4-4k kernel: raw: 0000000000243117 0000000000000000 00000000ffffffff 0000000000000000
Mar 28 06:05:19 extra-ext4-4k kernel: page dumped because: Not a kmalloc allocation
Mar 28 06:15:16 extra-ext4-4k kernel: EXT4-fs warning (device loop5): ext4_empty_dir:3088: inode #5176: comm fsstress: directory missing '.'
Mar 28 06:23:04 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5173: comm fsstress: checksumming directory block 0
Mar 28 06:23:15 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5173: comm fsstress: checksumming directory block 0
Mar 28 06:23:23 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5173: comm fsstress: checksumming directory block 0
Mar 28 06:23:28 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5173: comm fsstress: checksumming directory block 0
Mar 28 06:23:43 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5173: comm fsstress: checksumming directory block 0
Mar 28 06:24:02 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5173: comm fsstress: checksumming directory block 0
Mar 28 06:24:35 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5173: comm fsstress: checksumming directory block 0
Mar 28 06:30:04 extra-ext4-4k kernel: EXT4-fs warning (device loop5): ext4_empty_dir:3088: inode #5173: comm fsstress: directory missing '.'
Mar 28 06:32:30 extra-ext4-4k kernel: EXT4-fs warning (device loop5): ext4_empty_dir:3088: inode #5188: comm fsstress: directory missing '.'
Mar 28 06:32:39 extra-ext4-4k kernel: EXT4-fs warning (device loop5): ext4_empty_dir:3088: inode #5176: comm fsstress: directory missing '.'
Mar 28 06:38:54 extra-ext4-4k kernel: EXT4-fs warning (device loop5): ext4_empty_dir:3088: inode #5173: comm fsstress: directory missing '.'
Mar 28 06:41:37 extra-ext4-4k kernel: EXT4-fs warning (device loop5): ext4_empty_dir:3088: inode #5187: comm fsstress: directory missing '.'
Mar 28 06:42:05 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5173: comm fsstress: checksumming directory block 0
Mar 28 06:42:06 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5173: comm fsstress: checksumming directory block 0
Mar 28 06:42:22 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5173: comm fsstress: checksumming directory block 0
Mar 28 06:42:38 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5173: comm fsstress: checksumming directory block 0
Mar 28 06:42:42 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5173: comm fsstress: checksumming directory block 0
Mar 28 06:42:53 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5173: comm fsstress: checksumming directory block 0
Mar 28 06:42:54 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5173: comm fsstress: checksumming directory block 0
Mar 28 06:43:02 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5173: comm fsstress: checksumming directory block 0
Mar 28 06:43:12 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5173: comm fsstress: checksumming directory block 0
Mar 28 06:43:15 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5173: comm fsstress: checksumming directory block 0
Mar 28 06:53:28 extra-ext4-4k kernel: EXT4-fs warning (device loop5): ext4_empty_dir:3088: inode #5188: comm fsstress: directory missing '.'
Mar 28 06:54:36 extra-ext4-4k kernel: EXT4-fs warning (device loop5): ext4_empty_dir:3088: inode #5176: comm fsstress: directory missing '.'
Mar 28 06:55:07 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5173: comm fsstress: checksumming directory block 0
Mar 28 06:55:09 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5173: comm fsstress: checksumming directory block 0
Mar 28 06:55:12 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5173: comm fsstress: checksumming directory block 0
Mar 28 07:04:21 extra-ext4-4k kernel: EXT4-fs error (device loop5): ext4_get_first_dir_block:3538: inode #5173: comm fsstress: directory missing '.'
Mar 28 07:11:04 extra-ext4-4k kernel: EXT4-fs warning (device loop5): ext4_empty_dir:3088: inode #5173: comm fsstress: directory missing '.'
Mar 28 07:13:11 extra-ext4-4k kernel: EXT4-fs error (device loop5): ext4_get_first_dir_block:3538: inode #5174: comm fsstress: directory missing '.'
Mar 28 07:15:45 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5176: comm fsstress: checksumming directory block 0
Mar 28 07:15:49 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5176: comm fsstress: checksumming directory block 0
Mar 28 07:15:51 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5176: comm fsstress: checksumming directory block 0
Mar 28 07:15:52 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5176: comm fsstress: checksumming directory block 0
Mar 28 07:16:00 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5176: comm fsstress: checksumming directory block 0
Mar 28 07:16:41 extra-ext4-4k kernel: EXT4-fs error (device loop5): ext4_get_first_dir_block:3538: inode #5187: comm fsstress: directory missing '.'
Mar 28 07:24:00 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5176: comm fsstress: checksumming directory block 0
Mar 28 07:24:31 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5176: comm fsstress: checksumming directory block 0
Mar 28 07:25:40 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #8703: comm fsstress: checksumming directory block 0
Mar 28 07:25:47 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #8703: comm fsstress: checksumming directory block 0
Mar 28 07:25:50 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #8703: comm fsstress: checksumming directory block 0
Mar 28 07:26:18 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #8684: comm fsstress: checksumming directory block 0
Mar 28 07:41:04 extra-ext4-4k kernel: EXT4-fs error (device loop5): ext4_get_first_dir_block:3538: inode #5188: comm fsstress: directory missing '.'
Mar 28 07:41:11 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5174: comm fsstress: checksumming directory block 0
Mar 28 07:44:41 extra-ext4-4k kernel: EXT4-fs error (device loop5): ext4_get_first_dir_block:3538: inode #5187: comm fsstress: directory missing '.'
Mar 28 07:47:20 extra-ext4-4k kernel: EXT4-fs warning (device loop5): ext4_empty_dir:3088: inode #5187: comm fsstress: directory missing '.'
Mar 28 07:47:28 extra-ext4-4k kernel: EXT4-fs warning (device loop5): ext4_empty_dir:3088: inode #5187: comm fsstress: directory missing '.'
Mar 28 07:47:56 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5176: comm fsstress: checksumming directory block 0
Mar 28 07:49:05 extra-ext4-4k kernel: EXT4-fs warning (device loop5): ext4_empty_dir:3088: inode #5187: comm fsstress: directory missing '.'
Mar 28 07:53:26 extra-ext4-4k kernel: EXT4-fs warning (device loop5): ext4_empty_dir:3088: inode #5173: comm fsstress: directory missing '.'
Mar 28 08:16:43 extra-ext4-4k kernel: EXT4-fs error (device loop5): ext4_find_extent:938: inode #1104560: comm fsstress: pblk 4932229 bad header/extent: invalid magic - magic 8383, entries 33667, max 33667(0), depth 33667(0)
Mar 28 08:16:43 extra-ext4-4k kernel: EXT4-fs error (device loop5): ext4_find_extent:938: inode #1104560: comm fsstress: pblk 4932229 bad header/extent: invalid magic - magic 8383, entries 33667, max 33667(0), depth 33667(0)
Mar 28 08:16:43 extra-ext4-4k kernel: EXT4-fs error (device loop5): ext4_find_extent:938: inode #1104560: comm fsstress: pblk 4932229 bad header/extent: invalid magic - magic 8383, entries 33667, max 33667(0), depth 33667(0)
Mar 28 08:16:43 extra-ext4-4k kernel: EXT4-fs error (device loop5): ext4_find_extent:938: inode #1104560: comm fsstress: pblk 4932229 bad header/extent: invalid magic - magic 8383, entries 33667, max 33667(0), depth 33667(0)
Mar 28 08:16:43 extra-ext4-4k kernel: EXT4-fs error (device loop5): ext4_find_extent:938: inode #1104560: comm fsstress: pblk 4932229 bad header/extent: invalid magic - magic 8383, entries 33667, max 33667(0), depth 33667(0)
Mar 28 08:16:43 extra-ext4-4k kernel: EXT4-fs error (device loop5): ext4_find_extent:938: inode #1104560: comm fsstress: pblk 4932229 bad header/extent: invalid magic - magic 8383, entries 33667, max 33667(0), depth 33667(0)
Mar 28 08:16:43 extra-ext4-4k kernel: EXT4-fs error (device loop5): ext4_find_extent:938: inode #1104560: comm fsstress: pblk 4932229 bad header/extent: invalid magic - magic 8383, entries 33667, max 33667(0), depth 33667(0)
Mar 28 08:16:43 extra-ext4-4k kernel: EXT4-fs error (device loop5): ext4_find_extent:938: inode #1104560: comm fsstress: pblk 4932229 bad header/extent: invalid magic - magic 8383, entries 33667, max 33667(0), depth 33667(0)
Mar 28 08:16:43 extra-ext4-4k kernel: EXT4-fs error (device loop5): ext4_find_extent:938: inode #1104560: comm fsstress: pblk 4932229 bad header/extent: invalid magic - magic 8383, entries 33667, max 33667(0), depth 33667(0)
Mar 28 08:16:43 extra-ext4-4k kernel: EXT4-fs error (device loop5): ext4_find_extent:938: inode #1104560: comm fsstress: pblk 4932229 bad header/extent: invalid magic - magic 8383, entries 33667, max 33667(0), depth 33667(0)
Mar 28 08:19:26 extra-ext4-4k kernel: EXT4-fs error: 6 callbacks suppressed
Mar 28 08:19:26 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5173: comm fsstress: checksumming directory block 0
Mar 28 08:21:37 extra-ext4-4k kernel: EXT4-fs warning (device loop5): ext4_empty_dir:3088: inode #5173: comm fsstress: directory missing '.'
Mar 28 08:28:17 extra-ext4-4k kernel: EXT4-fs warning (device loop5): ext4_empty_dir:3088: inode #5173: comm fsstress: directory missing '.'
Mar 28 08:30:17 extra-ext4-4k kernel: EXT4-fs error (device loop5): ext4_find_extent:938: inode #1104560: comm fsstress: pblk 4932229 bad header/extent: invalid magic - magic 8383, entries 33667, max 33667(0), depth 33667(0)
Mar 28 08:31:02 extra-ext4-4k kernel: EXT4-fs warning (device loop5): ext4_empty_dir:3088: inode #5176: comm fsstress: directory missing '.'
Mar 28 08:32:21 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #8699: comm fsstress: checksumming directory block 0
Mar 28 08:32:23 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #8699: comm fsstress: checksumming directory block 0
Mar 28 08:32:24 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #8699: comm fsstress: checksumming directory block 0
Mar 28 08:32:31 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #8699: comm fsstress: checksumming directory block 0
Mar 28 08:32:36 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #8699: comm fsstress: checksumming directory block 0
Mar 28 08:32:43 extra-ext4-4k kernel: EXT4-fs warning (device loop5): ext4_empty_dir:3088: inode #5176: comm fsstress: directory missing '.'
Mar 28 08:34:47 extra-ext4-4k kernel: EXT4-fs warning (device loop5): ext4_empty_dir:3088: inode #5174: comm fsstress: directory missing '.'
Mar 28 08:34:58 extra-ext4-4k kernel: EXT4-fs warning (device loop5): ext4_empty_dir:3088: inode #5173: comm fsstress: directory missing '.'
Mar 28 08:35:01 extra-ext4-4k kernel: EXT4-fs warning (device loop5): ext4_empty_dir:3088: inode #5187: comm fsstress: directory missing '.'
Mar 28 08:37:11 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #8699: comm fsstress: checksumming directory block 0
Mar 28 08:37:12 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #8699: comm fsstress: checksumming directory block 0
Mar 28 08:37:14 extra-ext4-4k kernel: EXT4-fs warning (device loop5): ext4_empty_dir:3088: inode #5188: comm fsstress: directory missing '.'
Mar 28 08:37:17 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #8699: comm fsstress: checksumming directory block 0
Mar 28 08:39:32 extra-ext4-4k kernel: EXT4-fs warning (device loop5): ext4_empty_dir:3088: inode #5174: comm fsstress: directory missing '.'
Mar 28 08:40:52 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5176: comm fsstress: checksumming directory block 0
Mar 28 08:40:55 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5176: comm fsstress: checksumming directory block 0
Mar 28 08:41:03 extra-ext4-4k kernel: EXT4-fs warning (device loop5): ext4_empty_dir:3088: inode #5173: comm fsstress: directory missing '.'
Mar 28 08:54:04 extra-ext4-4k kernel: EXT4-fs error (device loop5): ext4_get_first_dir_block:3538: inode #5187: comm fsstress: directory missing '.'
Mar 28 08:58:02 extra-ext4-4k kernel: EXT4-fs warning (device loop5): ext4_empty_dir:3088: inode #5188: comm fsstress: directory missing '.'
Mar 28 09:00:10 extra-ext4-4k kernel: EXT4-fs error (device loop5): ext4_get_first_dir_block:3538: inode #5174: comm fsstress: directory missing '.'
Mar 28 09:01:30 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5174: comm fsstress: checksumming directory block 0
Mar 28 09:04:55 extra-ext4-4k kernel: EXT4-fs warning (device loop5): ext4_empty_dir:3088: inode #5176: comm fsstress: directory missing '.'
Mar 28 09:05:48 extra-ext4-4k kernel: EXT4-fs warning (device loop5): ext4_empty_dir:3088: inode #5188: comm fsstress: directory missing '.'
Mar 28 09:07:16 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5176: comm fsstress: checksumming directory block 0
Mar 28 09:07:21 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5176: comm fsstress: checksumming directory block 0
Mar 28 09:07:31 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5176: comm fsstress: checksumming directory block 0
Mar 28 09:07:33 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5173: comm fsstress: checksumming directory block 0
Mar 28 09:07:34 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5173: comm fsstress: checksumming directory block 0
Mar 28 09:07:42 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5173: comm fsstress: checksumming directory block 0
Mar 28 09:07:43 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5173: comm fsstress: checksumming directory block 0
Mar 28 09:07:49 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5173: comm fsstress: checksumming directory block 0
Mar 28 09:13:23 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5188: comm fsstress: checksumming directory block 0
Mar 28 09:13:44 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5188: comm fsstress: checksumming directory block 0
Mar 28 09:13:56 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5188: comm fsstress: checksumming directory block 0
Mar 28 09:14:06 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5188: comm fsstress: checksumming directory block 0
Mar 28 09:14:33 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5187: comm fsstress: checksumming directory block 0
Mar 28 09:14:35 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5187: comm fsstress: checksumming directory block 0
Mar 28 09:14:50 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5187: comm fsstress: checksumming directory block 0
Mar 28 09:14:51 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5187: comm fsstress: checksumming directory block 0
Mar 28 09:14:53 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5187: comm fsstress: checksumming directory block 0
Mar 28 09:14:54 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5187: comm fsstress: checksumming directory block 0
Mar 28 09:14:55 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5187: comm fsstress: checksumming directory block 0
Mar 28 09:14:56 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5187: comm fsstress: checksumming directory block 0
Mar 28 09:14:57 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5187: comm fsstress: checksumming directory block 0
Mar 28 09:15:00 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5187: comm fsstress: checksumming directory block 0
Mar 28 09:15:11 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5187: comm fsstress: checksumming directory block 0
Mar 28 09:16:55 extra-ext4-4k kernel: EXT4-fs error (device loop5): ext4_find_extent:938: inode #1104560: comm fsstress: pblk 4932229 bad header/extent: invalid magic - magic 8383, entries 33667, max 33667(0), depth 33667(0)
Mar 28 09:17:22 extra-ext4-4k kernel: NOHZ tick-stop error: local softirq work is pending, handler #10!!!


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [linux-next:master] [block/bdev]  3c20917120: BUG:sleeping_function_called_from_invalid_context_at_mm/util.c
  2025-03-28  9:47                                 ` Luis Chamberlain
@ 2025-03-28 19:09                                   ` Luis Chamberlain
  2025-03-29  0:08                                     ` Luis Chamberlain
  0 siblings, 1 reply; 31+ messages in thread
From: Luis Chamberlain @ 2025-03-28 19:09 UTC (permalink / raw)
  To: Jan Kara, Kefeng Wang, Sebastian Andrzej Siewior, David Bueso,
	Tso Ted, Ritesh Harjani
  Cc: Johannes Weiner, Oliver Sang, Matthew Wilcox, David Hildenbrand,
	Alistair Popple, linux-mm, Christian Brauner, Hannes Reinecke,
	oe-lkp, lkp, John Garry, linux-block, ltp, Pankaj Raghav,
	Daniel Gomez, Dave Chinner, gost.dev, mcgrof

On Fri, Mar 28, 2025 at 02:48:00AM -0700, Luis Chamberlain wrote:
> On Thu, Mar 27, 2025 at 09:21:30PM -0700, Luis Chamberlain wrote:
> > Would the extra ref check added via commit 060913999d7a9e50 ("mm:
> > migrate: support poisoned recover from migrate folio") make the removal
> > of the spin lock safe now given all the buffers are locked from the
> > folio? This survives some basic sanity checks on my end with
> > generic/750 against ext4 and also filling a drive at the same time with
> > fio. I have a feeling is we are not sure, do we have a reproducer for
> > the issue reported through ebdf4de5642fb6 ("mm: migrate: fix reference
> > check race between __find_get_block() and migration")? I suspect the
> > answer is no.

Sebastian, David, is there a reason CONFIG_DEBUG_ATOMIC_SLEEP=y won't
trigger a atomic sleeping context warning when cond_resched() is used?
Syzbot and 0-day had ways to reproduce it a kernel warning under these
conditions, but this config didn't, and require dan explicit might_sleep()

CONFIG_PREEMPT_BUILD=y
CONFIG_ARCH_HAS_PREEMPT_LAZY=y
# CONFIG_PREEMPT_NONE is not set
# CONFIG_PREEMPT_VOLUNTARY is not set
CONFIG_PREEMPT=y
# CONFIG_PREEMPT_LAZY is not set
# CONFIG_PREEMPT_RT is not set
CONFIG_PREEMPT_COUNT=y
CONFIG_PREEMPTION=y
CONFIG_PREEMPT_DYNAMIC=y
CONFIG_PREEMPT_RCU=y
CONFIG_HAVE_PREEMPT_DYNAMIC=y
CONFIG_HAVE_PREEMPT_DYNAMIC_CALL=y
CONFIG_PREEMPT_NOTIFIERS=y
CONFIG_DEBUG_PREEMPT=y
CONFIG_PREEMPTIRQ_TRACEPOINTS=y
# CONFIG_PREEMPT_TRACER is not set
# CONFIG_PREEMPTIRQ_DELAY_TEST is not set

Are there some preemption configs under which cond_resched() won't
trigger a kernel splat where expected so the only thing I can think of
is perhaps some preempt configs don't implicate a sleep? If true,
instead of adding might_sleep() to one piece of code (in this case
foio_mc_copy()) I wonder if instead just adding it to cond_resched() may
be useful.

Note that the issue in question wouldn't trigger at all with ext4, that
some reports suggset it happened with btrfs  (0-day) with LTP, or
another test from syzbot was just coincidence on any filesystem, the
only way to reproduce this really was by triggering compaction with the
block device cache and hitting compaction as we're now enabling large
folios with the block device cache, and we've narrowed that down to
a simple reproducer of running

dd if=/dev/zero of=/dev/vde bs=1024M count=1024.

and by adding the might_sleep() on folio_mc_copy()

Then as for the issue we're analzying, now that I get back home I think
its important to highlight then that generic/750 seems likely able to
reproduce the original issue reported by commit ebdf4de5642fb6 ("mm:
migrate: fix reference check race between __find_get_block() and migration")
and that it takes about 3 hours to reproduce, which requires reverting
that commit which added the spin lock:

Mar 28 03:36:37 extra-ext4-4k unknown: run fstests generic/750 at 2025-03-28 03:36:37
<-- snip -->
Mar 28 05:57:09 extra-ext4-4k kernel: EXT4-fs error (device loop5): ext4_get_first_dir_block:3538: inode #5174: comm fsstress: directory missing '.'

Jan, can you confirm if the symptoms match the original report?

It would be good for us to see if running the newly proposed generic/764
I am proposing [0] can reproduce that corruption faster than 3 hours.

If we have a reproducer we can work on evaluating a fix for both the
older ext4 issue reported by commit ebdf4de5642fb6 and also remove
the spin lock from page migration to support large folios.

And lastly, can __find_get_block() avoid running in case of page
migration? Do we have semantics from a filesystem perspective to prevent
work in filesystems going on when page migration on a folio is happening
in atomic context? If not, do we need it?

[0] https://lore.kernel.org/all/20250326185101.2237319-1-mcgrof@kernel.org/T/#u

  Luis


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [linux-next:master] [block/bdev]  3c20917120: BUG:sleeping_function_called_from_invalid_context_at_mm/util.c
  2025-03-28 19:09                                   ` Luis Chamberlain
@ 2025-03-29  0:08                                     ` Luis Chamberlain
  2025-03-29  1:06                                       ` Luis Chamberlain
  2025-03-31  7:45                                       ` Sebastian Andrzej Siewior
  0 siblings, 2 replies; 31+ messages in thread
From: Luis Chamberlain @ 2025-03-29  0:08 UTC (permalink / raw)
  To: Jan Kara, Kefeng Wang, Sebastian Andrzej Siewior, David Bueso,
	Tso Ted, Ritesh Harjani
  Cc: Johannes Weiner, Oliver Sang, Matthew Wilcox, David Hildenbrand,
	Alistair Popple, linux-mm, Christian Brauner, Hannes Reinecke,
	oe-lkp, lkp, John Garry, linux-block, ltp, Pankaj Raghav,
	Daniel Gomez, Dave Chinner, gost.dev

On Fri, Mar 28, 2025 at 12:09:06PM -0700, Luis Chamberlain wrote:
> On Fri, Mar 28, 2025 at 02:48:00AM -0700, Luis Chamberlain wrote:
> > On Thu, Mar 27, 2025 at 09:21:30PM -0700, Luis Chamberlain wrote:
> > > Would the extra ref check added via commit 060913999d7a9e50 ("mm:
> > > migrate: support poisoned recover from migrate folio") make the removal
> > > of the spin lock safe now given all the buffers are locked from the
> > > folio? This survives some basic sanity checks on my end with
> > > generic/750 against ext4 and also filling a drive at the same time with
> > > fio. I have a feeling is we are not sure, do we have a reproducer for
> > > the issue reported through ebdf4de5642fb6 ("mm: migrate: fix reference
> > > check race between __find_get_block() and migration")? I suspect the
> > > answer is no.
> 
> Sebastian, David, is there a reason CONFIG_DEBUG_ATOMIC_SLEEP=y won't
> trigger a atomic sleeping context warning when cond_resched() is used?
> Syzbot and 0-day had ways to reproduce it a kernel warning under these
> conditions, but this config didn't, and require dan explicit might_sleep()
> 
> CONFIG_PREEMPT_BUILD=y
> CONFIG_ARCH_HAS_PREEMPT_LAZY=y
> # CONFIG_PREEMPT_NONE is not set
> # CONFIG_PREEMPT_VOLUNTARY is not set
> CONFIG_PREEMPT=y
> # CONFIG_PREEMPT_LAZY is not set
> # CONFIG_PREEMPT_RT is not set
> CONFIG_PREEMPT_COUNT=y
> CONFIG_PREEMPTION=y
> CONFIG_PREEMPT_DYNAMIC=y
> CONFIG_PREEMPT_RCU=y
> CONFIG_HAVE_PREEMPT_DYNAMIC=y
> CONFIG_HAVE_PREEMPT_DYNAMIC_CALL=y
> CONFIG_PREEMPT_NOTIFIERS=y
> CONFIG_DEBUG_PREEMPT=y
> CONFIG_PREEMPTIRQ_TRACEPOINTS=y
> # CONFIG_PREEMPT_TRACER is not set
> # CONFIG_PREEMPTIRQ_DELAY_TEST is not set
> 
> Are there some preemption configs under which cond_resched() won't
> trigger a kernel splat where expected so the only thing I can think of
> is perhaps some preempt configs don't implicate a sleep? If true,
> instead of adding might_sleep() to one piece of code (in this case
> foio_mc_copy()) I wonder if instead just adding it to cond_resched() may
> be useful.

I think the answer to the above is "no".

And it took me quite some more testing with the below patch to convince myself
of that. Essentially, to trigger the cond_resched() atomic context warning
kernel warning we'd need to be in atomic context, and that today we can get
there through folio_mc_copy() through large folios.

Today the only atomic context we know which would end up in page
migration and folio_mc_copy() would be with buffer-head filesystems
which support large folios and which use buffer_migrate_folio_norefs() for
their migrate_folio() callback. The patch which we added which enabled the
block layer to support large folios did this only for cases where the
block size of the backing device is > PAGE_SIZE. So for instance your
qemu guest would need to have a logical block size larer than 4096 on
x86_64. To be clear, ext4 cannot possibly trigger this. No filesystem
can trigger this *case* other than the block device cache, and that
is only possible if block devices have larger block sizes.

The whole puzzle above about cond_resched() not rigger atomic warning
is because in fact, although buffer_migrate_folio_norefs() *does* always
use atomic context to call filemap_migrate_folio(), in practice I'm not
seeing it, that is, we likley bail before we even call folio_mc_copy().

So for instance we can see:

Mar 28 23:22:04 extra-ext4-4k kernel: __buffer_migrate_folio() in_atomic: 1
Mar 28 23:22:04 extra-ext4-4k kernel: __buffer_migrate_folio() in_atomic: 1
Mar 28 23:23:11 extra-ext4-4k kernel: large folios on folio_mc_copy(): 512 in_atomic(): 0
Mar 28 23:23:11 extra-ext4-4k kernel: large folios on folio_mc_copy(): in_atomic(): 0 calling cond_resched()
Mar 28 23:23:11 extra-ext4-4k kernel: large folios on folio_mc_copy(): in_atomic(): 0 calling cond_resched()

diff --git a/block/bdev.c b/block/bdev.c
index 4844d1e27b6f..1db9edfc4bc1 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -147,6 +147,11 @@ static void set_init_blocksize(struct block_device *bdev)
 			break;
 		bsize <<= 1;
 	}
+
+	if (bsize > PAGE_SIZE)
+		printk("%s: LBS device: mapping_set_folio_min_order(%u): %u\n",
+		       bdev->bd_disk->disk_name, get_order(bsize), bsize);
+
 	BD_INODE(bdev)->i_blkbits = blksize_bits(bsize);
 	mapping_set_folio_min_order(BD_INODE(bdev)->i_mapping,
 				    get_order(bsize));
diff --git a/mm/migrate.c b/mm/migrate.c
index f3ee6d8d5e2e..210df4970573 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -851,6 +851,7 @@ static int __buffer_migrate_folio(struct address_space *mapping,
 recheck_buffers:
 		busy = false;
 		spin_lock(&mapping->i_private_lock);
+		printk("__buffer_migrate_folio() in_atomic: %d\n", in_atomic());
 		bh = head;
 		do {
 			if (atomic_read(&bh->b_count)) {
@@ -871,6 +872,8 @@ static int __buffer_migrate_folio(struct address_space *mapping,
 		}
 	}
 
+	if (check_refs)
+		printk("__buffer_migrate_folio() calling filemap_migrate_folio() in_atomic: %d\n", in_atomic());
 	rc = filemap_migrate_folio(mapping, dst, src, mode);
 	if (rc != MIGRATEPAGE_SUCCESS)
 		goto unlock_buffers;
diff --git a/mm/util.c b/mm/util.c
index 448117da071f..61c76712d4bb 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -735,11 +735,15 @@ int folio_mc_copy(struct folio *dst, struct folio *src)
 	long nr = folio_nr_pages(src);
 	long i = 0;
 
+	if (nr > 1)
+		printk("large folios on folio_mc_copy(): %lu in_atomic(): %d\n", nr, in_atomic());
+
 	for (;;) {
 		if (copy_mc_highpage(folio_page(dst, i), folio_page(src, i)))
 			return -EHWPOISON;
 		if (++i == nr)
 			break;
+		printk("large folios on folio_mc_copy(): in_atomic(): %d calling cond_resched()\n", in_atomic());
 		cond_resched();
 	}


And so effectively, it is true, cond_resched() is not in atomic context
above, even though  filemap_migrate_folio() is certainly being called
in atomic context. What changes in between is folios likely won't
migrate due to later checks in filemap_migrate_folio() like the new
ref check, and instead we end up with page migraiton later of a huge
page, and *that* is not in atomic context.

So, to be clear, I *still* cannot reproduce the original reports, even
though in theory it is evident how buffer_migrate_folio_norefs() *can*
call filemap_migrate_folio() in atomic context.

How 0-day and syzbot triggered this *without* a large block size block
device is perplexing to me, if it is true that one was not used.

How we still can't reproduce in_atomic() context in folio_mc_copy() is
another fun mystery.

That is to say, I can't see how the existing code could regress here.
Given only the only buffer-head filesystem which enables large folios
is the pseudo block device cache filesystem, and you'll only get LBS
devices if the logical block size > PAGE_SIZE.

Despite all this, we have two separate reports and no clear information
if this was using a large block device enabled or not, and so given the
traces above to help root out more bugs with large folios we should just
proactively add might_sleep() to __migrate_folio(). I'll send a patch
for that, that'll enhance our test coverage.

The reason why we likely are having  hard time to reproduce the issue is
this new check:

	/* Check whether src does not
	have extra refs before we do more work */  
        if (folio_ref_count(src) != expected_count)                              
		return -EAGAIN;    .

So, moving on, I think what's best is to see how we can get __find_get_block()
to not chug on during page migration.

  Luis


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [linux-next:master] [block/bdev]  3c20917120: BUG:sleeping_function_called_from_invalid_context_at_mm/util.c
  2025-03-29  0:08                                     ` Luis Chamberlain
@ 2025-03-29  1:06                                       ` Luis Chamberlain
  2025-03-31  7:45                                       ` Sebastian Andrzej Siewior
  1 sibling, 0 replies; 31+ messages in thread
From: Luis Chamberlain @ 2025-03-29  1:06 UTC (permalink / raw)
  To: Jan Kara, Kefeng Wang, Sebastian Andrzej Siewior, David Bueso,
	Tso Ted, Ritesh Harjani
  Cc: Johannes Weiner, Oliver Sang, Matthew Wilcox, David Hildenbrand,
	Alistair Popple, linux-mm, Christian Brauner, Hannes Reinecke,
	oe-lkp, lkp, John Garry, linux-block, ltp, Pankaj Raghav,
	Daniel Gomez, Dave Chinner, gost.dev

On Fri, Mar 28, 2025 at 05:08:40PM -0700, Luis Chamberlain wrote:
> So, moving on, I think what's best is to see how we can get __find_get_block()
> to not chug on during page migration.

Something like this maybe? Passes initial 10 minutes of generic/750
on ext4 while also blasting an LBS device with dd. I'll let it soak.
The second patch is what requieres more eyeballs / suggestions / ideas.

From 86b2315f3c80dd4562a1a0fa0734921d3e92398f Mon Sep 17 00:00:00 2001
From: Luis Chamberlain <mcgrof@kernel.org>
Date: Fri, 28 Mar 2025 17:12:48 -0700
Subject: [PATCH 1/3] mm/migrate: add might_sleep() on __migrate_folio()

When we do page migration of large folios folio_mc_copy() can
cond_resched() *iff* we are on a large folio. There's a hairy
bug reported by both 0-day [0] and  syzbot [1] where it has been
detected we can call folio_mc_copy() in atomic context. While,
technically speaking that should in theory be only possible today
from buffer-head filesystems using buffer_migrate_folio_norefs()
on page migration the only buffer-head large folio filesystem -- the
block device cache, and so with block devices with large block sizes.
However tracing shows that folio_mc_copy() *isn't* being called
as often as we'd expect from buffer_migrate_folio_norefs() path
as we're likely bailing early now thanks to the check added by commit
060913999d7a ("mm: migrate: support poisoned recover from migrate
folio").

*Most* folio_mc_copy() calls in turn end up *not* being in atomic
context, and so we won't hit a splat when using:

CONFIG_PROVE_LOCKING=y
CONFIG_DEBUG_ATOMIC_SLEEP=y

But we *want* to help proactively find callers of __migrate_folio() in
atomic context, so make might_sleep() explicit to help us root out
large folio atomic callers of migrate_folio().

Link: https://lkml.kernel.org/r/202503101536.27099c77-lkp@intel.com # [0]
Link: https://lkml.kernel.org/r/67e57c41.050a0220.2f068f.0033.GAE@google.com # [1]
Link: https://lkml.kernel.org/r/Z-c6BqCSmAnNxb57@bombadil.infradead.org # [2]
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 mm/migrate.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/mm/migrate.c b/mm/migrate.c
index f3ee6d8d5e2e..712ddd11f3f0 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -751,6 +751,8 @@ static int __migrate_folio(struct address_space *mapping, struct folio *dst,
 {
 	int rc, expected_count = folio_expected_refs(mapping, src);
 
+	might_sleep();
+
 	/* Check whether src does not have extra refs before we do more work */
 	if (folio_ref_count(src) != expected_count)
 		return -EAGAIN;
-- 
2.47.2


From 561e94951fce481bb2e5917230bec7008c131d9a Mon Sep 17 00:00:00 2001
From: Luis Chamberlain <mcgrof@kernel.org>
Date: Fri, 28 Mar 2025 17:44:10 -0700
Subject: [PATCH 2/3] fs/buffer: avoid getting buffer if it is folio migration
 candidate

Avoid giving a way a buffer with __find_get_block_slow() if the
folio may be a folio migration candidate. We do this as an alternative
to the issue fixed by commit ebdf4de5642fb6 ("mm: migrate: fix reference
check race between __find_get_block() and migration"), given we've
determined that we should avoid requiring folio migration callers
from holding a spin lock while calling __migrate_folio().

This alternative simply avoids completing __find_get_block_slow()
on folio migration candidates to let us later rip out the spin_lock()
held on the buffer_migrate_folio_norefs() path.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 fs/buffer.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/fs/buffer.c b/fs/buffer.c
index c7abb4a029dc..6e2c3837a202 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -208,6 +208,12 @@ __find_get_block_slow(struct block_device *bdev, sector_t block)
 	head = folio_buffers(folio);
 	if (!head)
 		goto out_unlock;
+
+	if (folio_test_lru(folio) &&
+	    folio_test_locked(folio) &&
+	    !folio_test_writeback(folio))
+		goto out_unlock;
+
 	bh = head;
 	do {
 		if (!buffer_mapped(bh))
-- 
2.47.2


From af6963b73a8406162e6c2223fae600a799402e2b Mon Sep 17 00:00:00 2001
From: Luis Chamberlain <mcgrof@kernel.org>
Date: Fri, 28 Mar 2025 17:51:39 -0700
Subject: [PATCH 3/3] mm/migrate: avoid atomic context on
 buffer_migrate_folio_norefs() migration

The buffer_migrate_folio_norefs() should avoid holding the spin lock
held in order to ensure we can support large folios. The prior commit
"fs/buffer: avoid getting buffer if it is folio migration candidate"
ripped out the only rationale for having the atomic context,  so we can
remove the spin lock call now.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 mm/migrate.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index 712ddd11f3f0..f3047c685706 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -861,12 +861,12 @@ static int __buffer_migrate_folio(struct address_space *mapping,
 			}
 			bh = bh->b_this_page;
 		} while (bh != head);
+		spin_unlock(&mapping->i_private_lock);
 		if (busy) {
 			if (invalidated) {
 				rc = -EAGAIN;
 				goto unlock_buffers;
 			}
-			spin_unlock(&mapping->i_private_lock);
 			invalidate_bh_lrus();
 			invalidated = true;
 			goto recheck_buffers;
@@ -884,8 +884,6 @@ static int __buffer_migrate_folio(struct address_space *mapping,
 	} while (bh != head);
 
 unlock_buffers:
-	if (check_refs)
-		spin_unlock(&mapping->i_private_lock);
 	bh = head;
 	do {
 		unlock_buffer(bh);
-- 
2.47.2



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [linux-next:master] [block/bdev]  3c20917120: BUG:sleeping_function_called_from_invalid_context_at_mm/util.c
  2025-03-29  0:08                                     ` Luis Chamberlain
  2025-03-29  1:06                                       ` Luis Chamberlain
@ 2025-03-31  7:45                                       ` Sebastian Andrzej Siewior
  2025-04-08 16:43                                         ` Darrick J. Wong
  1 sibling, 1 reply; 31+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-03-31  7:45 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: Jan Kara, Kefeng Wang, David Bueso, Tso Ted, Ritesh Harjani,
	Johannes Weiner, Oliver Sang, Matthew Wilcox, David Hildenbrand,
	Alistair Popple, linux-mm, Christian Brauner, Hannes Reinecke,
	oe-lkp, lkp, John Garry, linux-block, ltp, Pankaj Raghav,
	Daniel Gomez, Dave Chinner, gost.dev

On 2025-03-28 17:08:38 [-0700], Luis Chamberlain wrote:
…
> > Are there some preemption configs under which cond_resched() won't
> > trigger a kernel splat where expected so the only thing I can think of
> > is perhaps some preempt configs don't implicate a sleep? If true,
> > instead of adding might_sleep() to one piece of code (in this case
> > foio_mc_copy()) I wonder if instead just adding it to cond_resched() may
> > be useful.
> 
> I think the answer to the above is "no".

I would say so. You need CONFIG_DEBUG_ATOMIC_SLEEP for the might-sleep
magic to work. And then the splat from might_sleep() isn't different
than the one from cond_resched(). 

> 
>   Luis

Sebastian


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [linux-next:master] [block/bdev]  3c20917120: BUG:sleeping_function_called_from_invalid_context_at_mm/util.c
  2025-03-31  7:45                                       ` Sebastian Andrzej Siewior
@ 2025-04-08 16:43                                         ` Darrick J. Wong
  2025-04-08 17:06                                           ` Luis Chamberlain
  0 siblings, 1 reply; 31+ messages in thread
From: Darrick J. Wong @ 2025-04-08 16:43 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: Jan Kara, Kefeng Wang, David Bueso, Tso Ted, Ritesh Harjani,
	Johannes Weiner, Oliver Sang, Matthew Wilcox, David Hildenbrand,
	Alistair Popple, linux-mm, Christian Brauner, Hannes Reinecke,
	oe-lkp, lkp, John Garry, linux-block, ltp, Pankaj Raghav,
	Daniel Gomez, Dave Chinner, gost.dev, linux-fsdevel

Hi Luis,

I'm not sure if this is related, but I'm seeing the same "BUG: sleeping
function called from invalid context at mm/util.c:743" message when
running fstests on XFS.  Nothing exciting with fstests here other than
the machine is arm64 with 64k basepages and 4k fsblock size:

MKFS_OPTIONS="-m metadir=1,autofsck=1,uquota,gquota,pquota"

--D

[18182.889554] run fstests generic/457 at 2025-04-07 23:06:25
[18182.973535] spectre-v4 mitigation disabled by command-line option
[18184.849467] XFS (sda3): EXPERIMENTAL metadata directory tree feature enabled.  Use at your own risk!
[18184.852941] XFS (sda3): EXPERIMENTAL exchange range feature enabled.  Use at your own risk!
[18184.852962] XFS (sda3): EXPERIMENTAL parent pointer feature enabled.  Use at your own risk!
[18184.858065] XFS (sda3): Mounting V5 Filesystem 13d8c72d-ddac-4052-8d3c-a82c4ce0377d
[18184.900002] XFS (sda3): Ending clean mount
[18184.905990] XFS (sda3): Quotacheck needed: Please wait.
[18184.919801] XFS (sda3): Quotacheck: Done.
[18184.954170] XFS (sda3): Unmounting Filesystem 13d8c72d-ddac-4052-8d3c-a82c4ce0377d
[18186.165572] XFS (dm-4): EXPERIMENTAL metadata directory tree feature enabled.  Use at your own risk!
[18186.165601] XFS (dm-4): EXPERIMENTAL exchange range feature enabled.  Use at your own risk!
[18186.165608] XFS (dm-4): EXPERIMENTAL parent pointer feature enabled.  Use at your own risk!
[18186.169589] XFS (dm-4): Mounting V5 Filesystem 6ade490d-15b0-43e5-9f17-db534769c746
[18187.121289] XFS (dm-4): Ending clean mount
[18187.131797] XFS (dm-4): Quotacheck needed: Please wait.
[18187.145700] XFS (dm-4): Quotacheck: Done.
[18187.393486] XFS (dm-4): Unmounting Filesystem 6ade490d-15b0-43e5-9f17-db534769c746
[18190.592061] XFS (dm-3): EXPERIMENTAL metadata directory tree feature enabled.  Use at your own risk!
[18190.592083] XFS (dm-3): EXPERIMENTAL exchange range feature enabled.  Use at your own risk!
[18190.592089] XFS (dm-3): EXPERIMENTAL parent pointer feature enabled.  Use at your own risk!
[18190.601815] XFS (dm-3): Mounting V5 Filesystem 6ade490d-15b0-43e5-9f17-db534769c746
[18190.744215] XFS (dm-3): Starting recovery (logdev: internal)
[18190.807553] XFS (dm-3): Ending recovery (logdev: internal)
[18190.818708] XFS (dm-3): Unmounting Filesystem 6ade490d-15b0-43e5-9f17-db534769c746
[18193.786621] XFS (dm-3): EXPERIMENTAL metadata directory tree feature enabled.  Use at your own risk!
[18193.788879] XFS (dm-3): EXPERIMENTAL exchange range feature enabled.  Use at your own risk!
[18193.788882] XFS (dm-3): EXPERIMENTAL parent pointer feature enabled.  Use at your own risk!
[18193.790518] XFS (dm-3): Mounting V5 Filesystem 6ade490d-15b0-43e5-9f17-db534769c746
[18193.877969] XFS (dm-3): Starting recovery (logdev: internal)
[18193.917688] XFS (dm-3): Ending recovery (logdev: internal)
[18193.945675] XFS (dm-3): Unmounting Filesystem 6ade490d-15b0-43e5-9f17-db534769c746
[18196.985726] XFS (dm-3): EXPERIMENTAL metadata directory tree feature enabled.  Use at your own risk!
[18196.988868] XFS (dm-3): EXPERIMENTAL exchange range feature enabled.  Use at your own risk!
[18196.988873] XFS (dm-3): EXPERIMENTAL parent pointer feature enabled.  Use at your own risk!
[18196.998845] XFS (dm-3): Mounting V5 Filesystem 6ade490d-15b0-43e5-9f17-db534769c746
[18197.193740] XFS (dm-3): Starting recovery (logdev: internal)
[18197.254119] XFS (dm-3): Ending recovery (logdev: internal)
[18197.280596] XFS (dm-3): Unmounting Filesystem 6ade490d-15b0-43e5-9f17-db534769c746
[18200.173003] XFS (dm-3): EXPERIMENTAL metadata directory tree feature enabled.  Use at your own risk!
[18200.176855] XFS (dm-3): EXPERIMENTAL exchange range feature enabled.  Use at your own risk!
[18200.176859] XFS (dm-3): EXPERIMENTAL parent pointer feature enabled.  Use at your own risk!
[18200.185721] XFS (dm-3): Mounting V5 Filesystem 6ade490d-15b0-43e5-9f17-db534769c746
[18200.370893] XFS (dm-3): Starting recovery (logdev: internal)
[18200.430454] XFS (dm-3): Ending recovery (logdev: internal)
[18200.462036] XFS (dm-3): Unmounting Filesystem 6ade490d-15b0-43e5-9f17-db534769c746
[18203.311440] XFS (dm-3): EXPERIMENTAL metadata directory tree feature enabled.  Use at your own risk!
[18203.311454] XFS (dm-3): EXPERIMENTAL exchange range feature enabled.  Use at your own risk!
[18203.311464] XFS (dm-3): EXPERIMENTAL parent pointer feature enabled.  Use at your own risk!
[18203.324374] XFS (dm-3): Mounting V5 Filesystem 6ade490d-15b0-43e5-9f17-db534769c746
[18203.437989] XFS (dm-3): Starting recovery (logdev: internal)
[18203.491993] XFS (dm-3): Ending recovery (logdev: internal)
[18203.517090] XFS (dm-3): Unmounting Filesystem 6ade490d-15b0-43e5-9f17-db534769c746
[18206.442639] XFS (dm-3): EXPERIMENTAL metadata directory tree feature enabled.  Use at your own risk!
[18206.444851] XFS (dm-3): EXPERIMENTAL exchange range feature enabled.  Use at your own risk!
[18206.444854] XFS (dm-3): EXPERIMENTAL parent pointer feature enabled.  Use at your own risk!
[18206.455415] XFS (dm-3): Mounting V5 Filesystem 6ade490d-15b0-43e5-9f17-db534769c746
[18206.600488] XFS (dm-3): Starting recovery (logdev: internal)
[18206.642538] XFS (dm-3): Ending recovery (logdev: internal)
[18206.673822] XFS (dm-3): Unmounting Filesystem 6ade490d-15b0-43e5-9f17-db534769c746
[18209.666477] XFS (dm-3): EXPERIMENTAL metadata directory tree feature enabled.  Use at your own risk!
[18209.678778] XFS (dm-3): EXPERIMENTAL exchange range feature enabled.  Use at your own risk!
[18209.678782] XFS (dm-3): EXPERIMENTAL parent pointer feature enabled.  Use at your own risk!
[18209.690805] XFS (dm-3): Mounting V5 Filesystem 6ade490d-15b0-43e5-9f17-db534769c746
[18209.859688] XFS (dm-3): Starting recovery (logdev: internal)
[18209.923426] XFS (dm-3): Ending recovery (logdev: internal)
[18209.947181] XFS (dm-3): Unmounting Filesystem 6ade490d-15b0-43e5-9f17-db534769c746
[18212.920991] XFS (dm-3): EXPERIMENTAL metadata directory tree feature enabled.  Use at your own risk!
[18212.921001] XFS (dm-3): EXPERIMENTAL exchange range feature enabled.  Use at your own risk!
[18212.921012] XFS (dm-3): EXPERIMENTAL parent pointer feature enabled.  Use at your own risk!
[18212.925332] XFS (dm-3): Mounting V5 Filesystem 6ade490d-15b0-43e5-9f17-db534769c746
[18213.067578] XFS (dm-3): Starting recovery (logdev: internal)
[18213.138633] XFS (dm-3): Ending recovery (logdev: internal)
[18213.161827] XFS (dm-3): Unmounting Filesystem 6ade490d-15b0-43e5-9f17-db534769c746
[18216.154862] XFS (dm-3): EXPERIMENTAL metadata directory tree feature enabled.  Use at your own risk!
[18216.156952] XFS (dm-3): EXPERIMENTAL exchange range feature enabled.  Use at your own risk!
[18216.157070] XFS (dm-3): EXPERIMENTAL parent pointer feature enabled.  Use at your own risk!
[18216.161145] XFS (dm-3): Mounting V5 Filesystem 6ade490d-15b0-43e5-9f17-db534769c746
[18216.333087] XFS (dm-3): Starting recovery (logdev: internal)
[18216.389192] XFS (dm-3): Ending recovery (logdev: internal)
[18216.410647] XFS (dm-3): Unmounting Filesystem 6ade490d-15b0-43e5-9f17-db534769c746
[18217.949035] BUG: sleeping function called from invalid context at mm/util.c:743
[18217.949047] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 35, name: kcompactd0
[18217.949056] preempt_count: 1, expected: 0
[18217.949058] RCU nest depth: 0, expected: 0
[18217.949060] Preemption disabled at:
[18217.949062] [<fffffe0080339c98>] __buffer_migrate_folio+0xb8/0x2d0
[18217.949070] CPU: 0 UID: 0 PID: 35 Comm: kcompactd0 Not tainted 6.15.0-rc1-acha #rc1 PREEMPT  92ec4d9d73adc951fe6bbe0d3f3b75d35d67fded
[18217.949074] Hardware name: QEMU KVM Virtual Machine, BIOS 1.6.6 08/22/2023
[18217.949075] Call trace:
[18217.949076]  show_stack+0x20/0x38 (C)
[18217.949080]  dump_stack_lvl+0x78/0x90
[18217.949083]  dump_stack+0x18/0x28
[18217.949084]  __might_resched+0x164/0x1d0
[18217.949086]  folio_mc_copy+0x5c/0xa0
[18217.949089]  __migrate_folio.constprop.0+0x70/0x1c8
[18217.949092]  __buffer_migrate_folio+0x2bc/0x2d0
[18217.949094]  buffer_migrate_folio_norefs+0x1c/0x30
[18217.949096]  move_to_new_folio+0x70/0x1f0
[18217.949099]  migrate_pages_batch+0x9c4/0xf20
[18217.949101]  migrate_pages+0xb74/0xde8
[18217.949103]  compact_zone+0x9ac/0xff0
[18217.949105]  compact_node+0x9c/0x1a0
[18217.949107]  kcompactd+0x38c/0x400
[18217.949108]  kthread+0x144/0x210
[18217.949110]  ret_from_fork+0x10/0x20


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [linux-next:master] [block/bdev]  3c20917120: BUG:sleeping_function_called_from_invalid_context_at_mm/util.c
  2025-04-08 16:43                                         ` Darrick J. Wong
@ 2025-04-08 17:06                                           ` Luis Chamberlain
  2025-04-08 17:24                                             ` Luis Chamberlain
  0 siblings, 1 reply; 31+ messages in thread
From: Luis Chamberlain @ 2025-04-08 17:06 UTC (permalink / raw)
  To: Darrick J. Wong, David Bueso
  Cc: Jan Kara, Kefeng Wang, David Bueso, Tso Ted, Ritesh Harjani,
	Johannes Weiner, Oliver Sang, Matthew Wilcox, David Hildenbrand,
	Alistair Popple, linux-mm, Christian Brauner, Hannes Reinecke,
	oe-lkp, lkp, John Garry, linux-block, ltp, Pankaj Raghav,
	Daniel Gomez, Dave Chinner, gost.dev, linux-fsdevel

On Tue, Apr 08, 2025 at 09:43:07AM -0700, Darrick J. Wong wrote:
> Hi Luis,
> 
> I'm not sure if this is related, but I'm seeing the same "BUG: sleeping
> function called from invalid context at mm/util.c:743" message when
> running fstests on XFS.  Nothing exciting with fstests here other than
> the machine is arm64 with 64k basepages and 4k fsblock size:

How exotic :D

> MKFS_OPTIONS="-m metadir=1,autofsck=1,uquota,gquota,pquota"
> 
> --D
> 
> [18182.889554] run fstests generic/457 at 2025-04-07 23:06:25

Me and Davidlohr have some fixes brewed up now, before we post we just
want to run one more test for metrics on success rate analysis for folio
migration. Other than that, given the exotic nature of your system we'll
Cc you on preliminary patches, in case you can test to see if it also
fixes your issue. It should given your splat is on the buffer-head side
of things! See _buffer_migrate_folio() reference on the splat. Fun
puzzle for the community is figuring out *why* oh why did a large folio
end up being used on buffer-heads for your use case *without* an LBS
device (logical block size) being present, as I assume you didn't have
one, ie say a nvme or virtio block device with logical block size  >
PAGE_SIZE. The area in question would trigger on folio migration *only*
if you are migrating large buffer-head folios. We only create those if
you have an LBS device and are leveragin the block device cache or a
filesystem with buffer-heads with LBS (they don't exist yet other than
the block device cache).

Regardless, the patches we have brewed up should fix this, regardless
of the puzzle described above. We'll cc you for testing before we
post patches to address this.

  Luis


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [linux-next:master] [block/bdev] 3c20917120: BUG:sleeping_function_called_from_invalid_context_at_mm/util.c
  2025-04-08 17:06                                           ` Luis Chamberlain
@ 2025-04-08 17:24                                             ` Luis Chamberlain
  2025-04-08 17:48                                               ` Darrick J. Wong
  0 siblings, 1 reply; 31+ messages in thread
From: Luis Chamberlain @ 2025-04-08 17:24 UTC (permalink / raw)
  To: Darrick J. Wong, David Bueso
  Cc: Jan Kara, Kefeng Wang, Tso Ted, Ritesh Harjani, Johannes Weiner,
	Oliver Sang, Matthew Wilcox, David Hildenbrand, Alistair Popple,
	linux-mm, Christian Brauner, Hannes Reinecke, oe-lkp, lkp,
	John Garry, linux-block, ltp, Pankaj Raghav, Daniel Gomez,
	Dave Chinner, gost.dev, linux-fsdevel

On Tue, Apr 8, 2025 at 10:06 AM Luis Chamberlain <mcgrof@kernel.org> wrote:
> Fun
> puzzle for the community is figuring out *why* oh why did a large folio
> end up being used on buffer-heads for your use case *without* an LBS
> device (logical block size) being present, as I assume you didn't have
> one, ie say a nvme or virtio block device with logical block size  >
> PAGE_SIZE. The area in question would trigger on folio migration *only*
> if you are migrating large buffer-head folios. We only create those

To be clear, large folios for buffer-heads.

> if
> you have an LBS device and are leveraging the block device cache or a
> filesystem with buffer-heads with LBS (they don't exist yet other than
> the block device cache).

  Luis


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [linux-next:master] [block/bdev] 3c20917120: BUG:sleeping_function_called_from_invalid_context_at_mm/util.c
  2025-04-08 17:24                                             ` Luis Chamberlain
@ 2025-04-08 17:48                                               ` Darrick J. Wong
  2025-04-08 17:51                                                 ` Matthew Wilcox
  2025-04-08 18:06                                                 ` Luis Chamberlain
  0 siblings, 2 replies; 31+ messages in thread
From: Darrick J. Wong @ 2025-04-08 17:48 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: David Bueso, Jan Kara, Kefeng Wang, Tso Ted, Ritesh Harjani,
	Johannes Weiner, Oliver Sang, Matthew Wilcox, David Hildenbrand,
	Alistair Popple, linux-mm, Christian Brauner, Hannes Reinecke,
	oe-lkp, lkp, John Garry, linux-block, ltp, Pankaj Raghav,
	Daniel Gomez, Dave Chinner, gost.dev, linux-fsdevel

On Tue, Apr 08, 2025 at 10:24:40AM -0700, Luis Chamberlain wrote:
> On Tue, Apr 8, 2025 at 10:06 AM Luis Chamberlain <mcgrof@kernel.org> wrote:
> > Fun
> > puzzle for the community is figuring out *why* oh why did a large folio
> > end up being used on buffer-heads for your use case *without* an LBS
> > device (logical block size) being present, as I assume you didn't have
> > one, ie say a nvme or virtio block device with logical block size  >
> > PAGE_SIZE. The area in question would trigger on folio migration *only*
> > if you are migrating large buffer-head folios. We only create those
> 
> To be clear, large folios for buffer-heads.
> > if
> > you have an LBS device and are leveraging the block device cache or a
> > filesystem with buffer-heads with LBS (they don't exist yet other than
> > the block device cache).

My guess is that udev or something tries to read the disk label in
response to some uevent (mkfs, mount, unmount, etc), which creates a
large folio because min_order > 0, and attaches a buffer head.  There's
a separate crash report that I'll cc you on.

--D


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [linux-next:master] [block/bdev] 3c20917120: BUG:sleeping_function_called_from_invalid_context_at_mm/util.c
  2025-04-08 17:48                                               ` Darrick J. Wong
@ 2025-04-08 17:51                                                 ` Matthew Wilcox
  2025-04-08 18:02                                                   ` Darrick J. Wong
  2025-04-08 18:06                                                 ` Luis Chamberlain
  1 sibling, 1 reply; 31+ messages in thread
From: Matthew Wilcox @ 2025-04-08 17:51 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Luis Chamberlain, David Bueso, Jan Kara, Kefeng Wang, Tso Ted,
	Ritesh Harjani, Johannes Weiner, Oliver Sang, David Hildenbrand,
	Alistair Popple, linux-mm, Christian Brauner, Hannes Reinecke,
	oe-lkp, lkp, John Garry, linux-block, ltp, Pankaj Raghav,
	Daniel Gomez, Dave Chinner, gost.dev, linux-fsdevel

On Tue, Apr 08, 2025 at 10:48:55AM -0700, Darrick J. Wong wrote:
> On Tue, Apr 08, 2025 at 10:24:40AM -0700, Luis Chamberlain wrote:
> > On Tue, Apr 8, 2025 at 10:06 AM Luis Chamberlain <mcgrof@kernel.org> wrote:
> > > Fun
> > > puzzle for the community is figuring out *why* oh why did a large folio
> > > end up being used on buffer-heads for your use case *without* an LBS
> > > device (logical block size) being present, as I assume you didn't have
> > > one, ie say a nvme or virtio block device with logical block size  >
> > > PAGE_SIZE. The area in question would trigger on folio migration *only*
> > > if you are migrating large buffer-head folios. We only create those
> > 
> > To be clear, large folios for buffer-heads.
> > > if
> > > you have an LBS device and are leveraging the block device cache or a
> > > filesystem with buffer-heads with LBS (they don't exist yet other than
> > > the block device cache).
> 
> My guess is that udev or something tries to read the disk label in
> response to some uevent (mkfs, mount, unmount, etc), which creates a
> large folio because min_order > 0, and attaches a buffer head.  There's
> a separate crash report that I'll cc you on.

But you said:

> the machine is arm64 with 64k basepages and 4k fsblock size:

so that shouldn't be using large folios because you should have set the
order to 0.  Right?  Or did you mis-speak and use a 4K PAGE_SIZE kernel
with a 64k fsblocksize?


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [linux-next:master] [block/bdev] 3c20917120: BUG:sleeping_function_called_from_invalid_context_at_mm/util.c
  2025-04-08 17:51                                                 ` Matthew Wilcox
@ 2025-04-08 18:02                                                   ` Darrick J. Wong
  2025-04-08 18:51                                                     ` Matthew Wilcox
  0 siblings, 1 reply; 31+ messages in thread
From: Darrick J. Wong @ 2025-04-08 18:02 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Luis Chamberlain, David Bueso, Jan Kara, Kefeng Wang, Tso Ted,
	Ritesh Harjani, Johannes Weiner, Oliver Sang, David Hildenbrand,
	Alistair Popple, linux-mm, Christian Brauner, Hannes Reinecke,
	oe-lkp, lkp, John Garry, linux-block, ltp, Pankaj Raghav,
	Daniel Gomez, Dave Chinner, gost.dev, linux-fsdevel

On Tue, Apr 08, 2025 at 06:51:14PM +0100, Matthew Wilcox wrote:
> On Tue, Apr 08, 2025 at 10:48:55AM -0700, Darrick J. Wong wrote:
> > On Tue, Apr 08, 2025 at 10:24:40AM -0700, Luis Chamberlain wrote:
> > > On Tue, Apr 8, 2025 at 10:06 AM Luis Chamberlain <mcgrof@kernel.org> wrote:
> > > > Fun
> > > > puzzle for the community is figuring out *why* oh why did a large folio
> > > > end up being used on buffer-heads for your use case *without* an LBS
> > > > device (logical block size) being present, as I assume you didn't have
> > > > one, ie say a nvme or virtio block device with logical block size  >
> > > > PAGE_SIZE. The area in question would trigger on folio migration *only*
> > > > if you are migrating large buffer-head folios. We only create those
> > > 
> > > To be clear, large folios for buffer-heads.
> > > > if
> > > > you have an LBS device and are leveraging the block device cache or a
> > > > filesystem with buffer-heads with LBS (they don't exist yet other than
> > > > the block device cache).
> > 
> > My guess is that udev or something tries to read the disk label in
> > response to some uevent (mkfs, mount, unmount, etc), which creates a
> > large folio because min_order > 0, and attaches a buffer head.  There's
> > a separate crash report that I'll cc you on.
> 
> But you said:
> 
> > the machine is arm64 with 64k basepages and 4k fsblock size:
> 
> so that shouldn't be using large folios because you should have set the
> order to 0.  Right?  Or did you mis-speak and use a 4K PAGE_SIZE kernel
> with a 64k fsblocksize?

This particular kernel warning is arm64 with 64k base pages and a 4k
fsblock size, and my suspicion is that udev/libblkid are creating the
buffer heads or something weird like that.

On x64 with 4k base pages, xfs/032 creates a filesystem with 64k sector
size and there's an actual kernel crash resulting from a udev worker:
https://lore.kernel.org/linux-fsdevel/20250408175125.GL6266@frogsfrogsfrogs/T/#u

So I didn't misspeak, I just have two problems.  I actually have four
problems, but the others are loop device behavior changes.

--D


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [linux-next:master] [block/bdev] 3c20917120: BUG:sleeping_function_called_from_invalid_context_at_mm/util.c
  2025-04-08 17:48                                               ` Darrick J. Wong
  2025-04-08 17:51                                                 ` Matthew Wilcox
@ 2025-04-08 18:06                                                 ` Luis Chamberlain
  1 sibling, 0 replies; 31+ messages in thread
From: Luis Chamberlain @ 2025-04-08 18:06 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: David Bueso, Jan Kara, Kefeng Wang, Tso Ted, Ritesh Harjani,
	Johannes Weiner, Oliver Sang, Matthew Wilcox, David Hildenbrand,
	Alistair Popple, linux-mm, Christian Brauner, Hannes Reinecke,
	oe-lkp, lkp, John Garry, linux-block, ltp, Pankaj Raghav,
	Daniel Gomez, Dave Chinner, gost.dev, linux-fsdevel

On Tue, Apr 08, 2025 at 10:48:55AM -0700, Darrick J. Wong wrote:
> On Tue, Apr 08, 2025 at 10:24:40AM -0700, Luis Chamberlain wrote:
> > On Tue, Apr 8, 2025 at 10:06 AM Luis Chamberlain <mcgrof@kernel.org> wrote:
> > > Fun
> > > puzzle for the community is figuring out *why* oh why did a large folio
> > > end up being used on buffer-heads for your use case *without* an LBS
> > > device (logical block size) being present, as I assume you didn't have
> > > one, ie say a nvme or virtio block device with logical block size  >
> > > PAGE_SIZE. The area in question would trigger on folio migration *only*
> > > if you are migrating large buffer-head folios. We only create those
> > 
> > To be clear, large folios for buffer-heads.
> > > if
> > > you have an LBS device and are leveraging the block device cache or a
> > > filesystem with buffer-heads with LBS (they don't exist yet other than
> > > the block device cache).
> 
> My guess is that udev or something tries to read the disk label in
> response to some uevent (mkfs, mount, unmount, etc), which creates a
> large folio because min_order > 0, and attaches a buffer head.  There's
> a separate crash report that I'll cc you on.

OK so as willy pointed out I buy that for x86_64 *iff* we do already
have opportunistic large folio support for the buffer-head read/write
path. But also, I don't think we enable large folios yet on the block
device cache aops unless we have a min order block device... so what
gives?

  Luis


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [linux-next:master] [block/bdev] 3c20917120: BUG:sleeping_function_called_from_invalid_context_at_mm/util.c
  2025-04-08 18:02                                                   ` Darrick J. Wong
@ 2025-04-08 18:51                                                     ` Matthew Wilcox
  2025-04-08 19:13                                                       ` Luis Chamberlain
  2025-04-08 19:13                                                       ` Luis Chamberlain
  0 siblings, 2 replies; 31+ messages in thread
From: Matthew Wilcox @ 2025-04-08 18:51 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Luis Chamberlain, David Bueso, Jan Kara, Kefeng Wang, Tso Ted,
	Ritesh Harjani, Johannes Weiner, Oliver Sang, David Hildenbrand,
	Alistair Popple, linux-mm, Christian Brauner, Hannes Reinecke,
	oe-lkp, lkp, John Garry, linux-block, ltp, Pankaj Raghav,
	Daniel Gomez, Dave Chinner, gost.dev, linux-fsdevel

On Tue, Apr 08, 2025 at 11:02:40AM -0700, Darrick J. Wong wrote:
> On Tue, Apr 08, 2025 at 06:51:14PM +0100, Matthew Wilcox wrote:
> > On Tue, Apr 08, 2025 at 10:48:55AM -0700, Darrick J. Wong wrote:
> > > On Tue, Apr 08, 2025 at 10:24:40AM -0700, Luis Chamberlain wrote:
> > > > On Tue, Apr 8, 2025 at 10:06 AM Luis Chamberlain <mcgrof@kernel.org> wrote:
> > > > > Fun
> > > > > puzzle for the community is figuring out *why* oh why did a large folio
> > > > > end up being used on buffer-heads for your use case *without* an LBS
> > > > > device (logical block size) being present, as I assume you didn't have
> > > > > one, ie say a nvme or virtio block device with logical block size  >
> > > > > PAGE_SIZE. The area in question would trigger on folio migration *only*
> > > > > if you are migrating large buffer-head folios. We only create those
> > > > 
> > > > To be clear, large folios for buffer-heads.
> > > > > if
> > > > > you have an LBS device and are leveraging the block device cache or a
> > > > > filesystem with buffer-heads with LBS (they don't exist yet other than
> > > > > the block device cache).
> > > 
> > > My guess is that udev or something tries to read the disk label in
> > > response to some uevent (mkfs, mount, unmount, etc), which creates a
> > > large folio because min_order > 0, and attaches a buffer head.  There's
> > > a separate crash report that I'll cc you on.
> > 
> > But you said:
> > 
> > > the machine is arm64 with 64k basepages and 4k fsblock size:
> > 
> > so that shouldn't be using large folios because you should have set the
> > order to 0.  Right?  Or did you mis-speak and use a 4K PAGE_SIZE kernel
> > with a 64k fsblocksize?
> 
> This particular kernel warning is arm64 with 64k base pages and a 4k
> fsblock size, and my suspicion is that udev/libblkid are creating the
> buffer heads or something weird like that.
> 
> On x64 with 4k base pages, xfs/032 creates a filesystem with 64k sector
> size and there's an actual kernel crash resulting from a udev worker:
> https://lore.kernel.org/linux-fsdevel/20250408175125.GL6266@frogsfrogsfrogs/T/#u
> 
> So I didn't misspeak, I just have two problems.  I actually have four
> problems, but the others are loop device behavior changes.

Right, but this warning only triggers for large folios.  So somehow
we've got a multi-page folio in the bdev's page cache.

Ah.  I see.

block/bdev.c:   mapping_set_folio_min_order(BD_INODE(bdev)->i_mapping,

so we're telling the bdev that it can go up to MAX_PAGECACHE_ORDER.
And then we call readahead, which will happily put order-2 folios
in the pagecache because of my bug that we've never bothered fixing.

We should probably fix that now, but as a temporary measure if
you'd like to put:

mapping_set_folio_order_range(BD_INODE(bdev)->i_mapping, min, min)

instead of the mapping_set_folio_min_order(), that would make the bug
no longer appear for you.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [linux-next:master] [block/bdev] 3c20917120: BUG:sleeping_function_called_from_invalid_context_at_mm/util.c
  2025-04-08 18:51                                                     ` Matthew Wilcox
@ 2025-04-08 19:13                                                       ` Luis Chamberlain
  2025-04-08 19:13                                                       ` Luis Chamberlain
  1 sibling, 0 replies; 31+ messages in thread
From: Luis Chamberlain @ 2025-04-08 19:13 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Darrick J. Wong, David Bueso, Jan Kara, Kefeng Wang, Tso Ted,
	Ritesh Harjani, Johannes Weiner, Oliver Sang, David Hildenbrand,
	Alistair Popple, linux-mm, Christian Brauner, Hannes Reinecke,
	oe-lkp, lkp, John Garry, linux-block, ltp, Pankaj Raghav,
	Daniel Gomez, Dave Chinner, gost.dev, linux-fsdevel

On Tue, Apr 08, 2025 at 07:51:03PM +0100, Matthew Wilcox wrote:
> On Tue, Apr 08, 2025 at 11:02:40AM -0700, Darrick J. Wong wrote:
> > On Tue, Apr 08, 2025 at 06:51:14PM +0100, Matthew Wilcox wrote:
> > > On Tue, Apr 08, 2025 at 10:48:55AM -0700, Darrick J. Wong wrote:
> > > > On Tue, Apr 08, 2025 at 10:24:40AM -0700, Luis Chamberlain wrote:
> > > > > On Tue, Apr 8, 2025 at 10:06 AM Luis Chamberlain <mcgrof@kernel.org> wrote:
> > > > > > Fun
> > > > > > puzzle for the community is figuring out *why* oh why did a large folio
> > > > > > end up being used on buffer-heads for your use case *without* an LBS
> > > > > > device (logical block size) being present, as I assume you didn't have
> > > > > > one, ie say a nvme or virtio block device with logical block size  >
> > > > > > PAGE_SIZE. The area in question would trigger on folio migration *only*
> > > > > > if you are migrating large buffer-head folios. We only create those
> > > > > 
> > > > > To be clear, large folios for buffer-heads.
> > > > > > if
> > > > > > you have an LBS device and are leveraging the block device cache or a
> > > > > > filesystem with buffer-heads with LBS (they don't exist yet other than
> > > > > > the block device cache).
> > > > 
> > > > My guess is that udev or something tries to read the disk label in
> > > > response to some uevent (mkfs, mount, unmount, etc), which creates a
> > > > large folio because min_order > 0, and attaches a buffer head.  There's
> > > > a separate crash report that I'll cc you on.
> > > 
> > > But you said:
> > > 
> > > > the machine is arm64 with 64k basepages and 4k fsblock size:
> > > 
> > > so that shouldn't be using large folios because you should have set the
> > > order to 0.  Right?  Or did you mis-speak and use a 4K PAGE_SIZE kernel
> > > with a 64k fsblocksize?
> > 
> > This particular kernel warning is arm64 with 64k base pages and a 4k
> > fsblock size, and my suspicion is that udev/libblkid are creating the
> > buffer heads or something weird like that.
> > 
> > On x64 with 4k base pages, xfs/032 creates a filesystem with 64k sector
> > size and there's an actual kernel crash resulting from a udev worker:
> > https://lore.kernel.org/linux-fsdevel/20250408175125.GL6266@frogsfrogsfrogs/T/#u
> > 
> > So I didn't misspeak, I just have two problems.  I actually have four
> > problems, but the others are loop device behavior changes.
> 
> Right, but this warning only triggers for large folios.  So somehow
> we've got a multi-page folio in the bdev's page cache.
> 
> Ah.  I see.
> 
> block/bdev.c:   mapping_set_folio_min_order(BD_INODE(bdev)->i_mapping,
> 
> so we're telling the bdev that it can go up to MAX_PAGECACHE_ORDER.

Ah yes silly me that would explain the large folios without LBS devices.

> And then we call readahead, which will happily put order-2 folios
> in the pagecache because of my bug that we've never bothered fixing.
> 
> We should probably fix that now, but as a temporary measure if
> you'd like to put:
> 
> mapping_set_folio_order_range(BD_INODE(bdev)->i_mapping, min, min)
> 
> instead of the mapping_set_folio_min_order(), that would make the bug
> no longer appear for you.

Agreed.

  Luis


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [linux-next:master] [block/bdev] 3c20917120: BUG:sleeping_function_called_from_invalid_context_at_mm/util.c
  2025-04-08 18:51                                                     ` Matthew Wilcox
  2025-04-08 19:13                                                       ` Luis Chamberlain
@ 2025-04-08 19:13                                                       ` Luis Chamberlain
  1 sibling, 0 replies; 31+ messages in thread
From: Luis Chamberlain @ 2025-04-08 19:13 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Darrick J. Wong, David Bueso, Jan Kara, Kefeng Wang, Tso Ted,
	Ritesh Harjani, Johannes Weiner, Oliver Sang, David Hildenbrand,
	Alistair Popple, linux-mm, Christian Brauner, Hannes Reinecke,
	oe-lkp, lkp, John Garry, linux-block, ltp, Pankaj Raghav,
	Daniel Gomez, Dave Chinner, gost.dev, linux-fsdevel

On Tue, Apr 08, 2025 at 07:51:03PM +0100, Matthew Wilcox wrote:
> And then we call readahead, which will happily put order-2 folios
> in the pagecache because of my bug that we've never bothered fixing.

What was that BTW?

  Luis


^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2025-04-08 19:13 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <202503101536.27099c77-lkp@intel.com>
     [not found] ` <20250311-testphasen-behelfen-09b950bbecbf@brauner>
     [not found]   ` <Z9kEdPLNT8SOyOQT@xsang-OptiPlex-9020>
2025-03-18  8:15     ` [linux-next:master] [block/bdev] 3c20917120: BUG:sleeping_function_called_from_invalid_context_at_mm/util.c Luis Chamberlain
2025-03-18 14:37       ` Matthew Wilcox
2025-03-18 23:17         ` Luis Chamberlain
2025-03-19  2:58           ` Matthew Wilcox
2025-03-19 16:55             ` Luis Chamberlain
2025-03-19 19:16               ` Luis Chamberlain
2025-03-19 19:24                 ` Matthew Wilcox
2025-03-20 12:11                   ` Luis Chamberlain
2025-03-20 12:18                     ` Luis Chamberlain
2025-03-22 23:14                     ` Johannes Weiner
2025-03-23  1:02                       ` Luis Chamberlain
2025-03-23  7:07                         ` Luis Chamberlain
2025-03-25  6:52                           ` Oliver Sang
2025-03-28  1:44                             ` Luis Chamberlain
2025-03-28  4:21                               ` Luis Chamberlain
2025-03-28  9:47                                 ` Luis Chamberlain
2025-03-28 19:09                                   ` Luis Chamberlain
2025-03-29  0:08                                     ` Luis Chamberlain
2025-03-29  1:06                                       ` Luis Chamberlain
2025-03-31  7:45                                       ` Sebastian Andrzej Siewior
2025-04-08 16:43                                         ` Darrick J. Wong
2025-04-08 17:06                                           ` Luis Chamberlain
2025-04-08 17:24                                             ` Luis Chamberlain
2025-04-08 17:48                                               ` Darrick J. Wong
2025-04-08 17:51                                                 ` Matthew Wilcox
2025-04-08 18:02                                                   ` Darrick J. Wong
2025-04-08 18:51                                                     ` Matthew Wilcox
2025-04-08 19:13                                                       ` Luis Chamberlain
2025-04-08 19:13                                                       ` Luis Chamberlain
2025-04-08 18:06                                                 ` Luis Chamberlain
2025-03-20  1:24       ` Lai, Yi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox