* Re: [linux-next:master] [block/bdev] 3c20917120: BUG:sleeping_function_called_from_invalid_context_at_mm/util.c [not found] ` <Z9kEdPLNT8SOyOQT@xsang-OptiPlex-9020> @ 2025-03-18 8:15 ` Luis Chamberlain 2025-03-18 14:37 ` Matthew Wilcox 2025-03-20 1:24 ` Lai, Yi 0 siblings, 2 replies; 31+ messages in thread From: Luis Chamberlain @ 2025-03-18 8:15 UTC (permalink / raw) To: Oliver Sang, David Hildenbrand, Alistair Popple, linux-mm Cc: Christian Brauner, Hannes Reinecke, oe-lkp, lkp, Matthew Wilcox (Oracle), John Garry, linux-block, ltp, Pankaj Raghav, Daniel Gomez On Tue, Mar 18, 2025 at 01:28:20PM +0800, Oliver Sang wrote: > hi, Christian Brauner, > > On Tue, Mar 11, 2025 at 01:10:43PM +0100, Christian Brauner wrote: > > On Mon, Mar 10, 2025 at 03:43:49PM +0800, kernel test robot wrote: > > > > > > > > > Hello, > > > > > > kernel test robot noticed "BUG:sleeping_function_called_from_invalid_context_at_mm/util.c" on: > > > > > > commit: 3c20917120ce61f2a123ca0810293872f4c6b5a4 ("block/bdev: enable large folio support for large logical block sizes") > > > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master > > > > Is this also already fixed by: > > > > commit a64e5a596067 ("bdev: add back PAGE_SIZE block size validation for sb_set_blocksize()") > > > > ? > > sorry for late. > > commit a64e5a596067 cannot fix the issue. one dmesg is attached FYI. > > we also tried to check linux-next/master tip, but neither below one can boot > successfully in our env which we need further check. > > da920b7df70177 (tag: next-20250314, linux-next/master) Add linux-next specific files for 20250314 > > e94bd4ec45ac1 (tag: next-20250317, linux-next/master) Add linux-next specific files for 20250317 > > so we are not sure the status of latest linux-next/master. > > if you want us to check other commit or other patches, please let us know. thanks! I cannot reproduce the issue by running the LTP test manually in a loop for a long time: export LTP_RUNTIME_MUL=2 while true; do \ ./testcases/kernel/syscalls/close_range/close_range01; done What's the failure rate of just running the test alone above? Does it always fail on this system? Is this a deterministic failure or does it have a lower failure rate? I also can't see how the patch ("("block/bdev: enable large folio support for large logical block sizes") would trigger this. You could try this patch but ... https://lore.kernel.org/all/20250312050028.1784117-1-mcgrof@kernel.org/ we decided this is not right and not needed, and if we have a buggy block driver we can address that. I just can't see how this LTP test actually doing anything funky with block devices at all. The associated sleeping while atomic warning is triggered during compaction though: [ 218.143642][ T299] Architecture: x86_64 [ 218.143659][ T299] [ 218.427851][ T51] BUG: sleeping function called from invalid context at mm/util.c:901 [ 218.435981][ T51] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 51, name: kcompactd0 [ 218.444773][ T51] preempt_count: 1, expected: 0 [ 218.449601][ T51] RCU nest depth: 0, expected: 0 [ 218.454476][ T51] CPU: 2 UID: 0 PID: 51 Comm: kcompactd0 Tainted: G S 6.14.0-rc1-00006-g3c20917120ce #1 [ 218.454486][ T51] Tainted: [S]=CPU_OUT_OF_SPEC [ 218.454488][ T51] Hardware name: Hewlett-Packard HP Pro 3340 MT/17A1, BIOS 8.07 01/24/2013 [ 218.454492][ T51] Call Trace: [ 218.454495][ T51] <TASK> [ 218.454498][ T51] dump_stack_lvl+0x4f/0x70 [ 218.454508][ T51] __might_resched+0x2c6/0x450 [ 218.454517][ T51] folio_mc_copy+0xca/0x1f0 [ 218.454525][ T51] ? _raw_spin_lock+0x81/0xe0 [ 218.454532][ T51] __migrate_folio+0x11a/0x2d0 [ 218.454541][ T51] __buffer_migrate_folio+0x558/0x660 [ 218.454548][ T51] move_to_new_folio+0xf5/0x410 [ 218.454555][ T51] migrate_folio_move+0x211/0x770 [ 218.454562][ T51] ? __pfx_compaction_free+0x10/0x10 [ 218.454572][ T51] ? __pfx_migrate_folio_move+0x10/0x10 [ 218.454578][ T51] ? compaction_alloc_noprof+0x441/0x720 [ 218.454587][ T51] ? __pfx_compaction_alloc+0x10/0x10 [ 218.454594][ T51] ? __pfx_compaction_free+0x10/0x10 [ 218.454601][ T51] ? __pfx_compaction_free+0x10/0x10 [ 218.454607][ T51] ? migrate_folio_unmap+0x329/0x890 [ 218.454614][ T51] migrate_pages_batch+0xddf/0x1810 [ 218.454621][ T51] ? __pfx_compaction_free+0x10/0x10 [ 218.454631][ T51] ? __pfx_migrate_pages_batch+0x10/0x10 [ 218.454638][ T51] ? cgroup_rstat_updated+0xf1/0x860 [ 218.454648][ T51] migrate_pages_sync+0x10c/0x8e0 [ 218.454656][ T51] ? __pfx_compaction_alloc+0x10/0x10 [ 218.454662][ T51] ? __pfx_compaction_free+0x10/0x10 [ 218.454669][ T51] ? lru_gen_del_folio+0x383/0x820 [ 218.454677][ T51] ? __pfx_migrate_pages_sync+0x10/0x10 [ 218.454683][ T51] ? set_pfnblock_flags_mask+0x179/0x220 [ 218.454691][ T51] ? __pfx_lru_gen_del_folio+0x10/0x10 [ 218.454699][ T51] ? __pfx_compaction_alloc+0x10/0x10 [ 218.454705][ T51] ? __pfx_compaction_free+0x10/0x10 [ 218.454713][ T51] migrate_pages+0x846/0xe30 [ 218.454720][ T51] ? __pfx_compaction_alloc+0x10/0x10 [ 218.454726][ T51] ? __pfx_compaction_free+0x10/0x10 [ 218.454733][ T51] ? __pfx_buffer_migrate_folio_norefs+0x10/0x10 [ 218.454740][ T51] ? __pfx_migrate_pages+0x10/0x10 [ 218.454748][ T51] ? isolate_migratepages+0x32d/0xbd0 [ 218.454757][ T51] compact_zone+0x9e1/0x1680 [ 218.454767][ T51] ? __pfx_compact_zone+0x10/0x10 [ 218.454774][ T51] ? _raw_spin_lock_irqsave+0x87/0xe0 [ 218.454780][ T51] ? __pfx__raw_spin_lock_irqsave+0x10/0x10 [ 218.454788][ T51] compact_node+0x159/0x250 [ 218.454795][ T51] ? __pfx_compact_node+0x10/0x10 [ 218.454807][ T51] ? __pfx_extfrag_for_order+0x10/0x10 [ 218.454814][ T51] ? __pfx_mutex_unlock+0x10/0x10 [ 218.454822][ T51] ? finish_wait+0xd1/0x280 [ 218.454831][ T51] kcompactd+0x582/0x960 [ 218.454839][ T51] ? __pfx_kcompactd+0x10/0x10 [ 218.454846][ T51] ? _raw_spin_lock_irqsave+0x87/0xe0 [ 218.454852][ T51] ? __pfx__raw_spin_lock_irqsave+0x10/0x10 [ 218.454858][ T51] ? __pfx_autoremove_wake_function+0x10/0x10 [ 218.454867][ T51] ? __kthread_parkme+0xba/0x1e0 [ 218.454874][ T51] ? __pfx_kcompactd+0x10/0x10 [ 218.454880][ T51] kthread+0x3a1/0x770 [ 218.454887][ T51] ? __pfx_kthread+0x10/0x10 [ 218.454895][ T51] ? __pfx_kthread+0x10/0x10 [ 218.454902][ T51] ret_from_fork+0x30/0x70 [ 218.454910][ T51] ? __pfx_kthread+0x10/0x10 [ 218.454915][ T51] ret_from_fork_asm+0x1a/0x30 [ 218.454924][ T51] </TASK> So the only thing I can think of the patch which the patch can do is push more large folios to be used and so compaction can be a secondary effect which managed to trigger another mm issue. I know there was a recent migration fix but I can't see the relationship at all either. Luis ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [linux-next:master] [block/bdev] 3c20917120: BUG:sleeping_function_called_from_invalid_context_at_mm/util.c 2025-03-18 8:15 ` [linux-next:master] [block/bdev] 3c20917120: BUG:sleeping_function_called_from_invalid_context_at_mm/util.c Luis Chamberlain @ 2025-03-18 14:37 ` Matthew Wilcox 2025-03-18 23:17 ` Luis Chamberlain 2025-03-20 1:24 ` Lai, Yi 1 sibling, 1 reply; 31+ messages in thread From: Matthew Wilcox @ 2025-03-18 14:37 UTC (permalink / raw) To: Luis Chamberlain, Jan Kara Cc: Oliver Sang, David Hildenbrand, Alistair Popple, linux-mm, Christian Brauner, Hannes Reinecke, oe-lkp, lkp, John Garry, linux-block, ltp, Pankaj Raghav, Daniel Gomez On Tue, Mar 18, 2025 at 01:15:33AM -0700, Luis Chamberlain wrote: > I also can't see how the patch ("("block/bdev: enable large folio > support for large logical block sizes") would trigger this. Easy enough to see by checking the backtrace. > [ 218.454517][ T51] folio_mc_copy+0xca/0x1f0 > [ 218.454532][ T51] __migrate_folio+0x11a/0x2d0 > [ 218.454541][ T51] __buffer_migrate_folio+0x558/0x660 folio_mc_copy() calls cond_resched() for large folios only. __buffer_migrate_folio() calls spin_lock(&mapping->i_private_lock) so for folios without buffer heads attached, we never take the spinlock, and for small folios we never call cond_resched(). It's only the compaction path for large folios with buffer_heads attached that calls cond_resched() while holding a spinlock. Jan was the one who extended the spinlock to be held over the copy in ebdf4de5642f so adding him for thoughts. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [linux-next:master] [block/bdev] 3c20917120: BUG:sleeping_function_called_from_invalid_context_at_mm/util.c 2025-03-18 14:37 ` Matthew Wilcox @ 2025-03-18 23:17 ` Luis Chamberlain 2025-03-19 2:58 ` Matthew Wilcox 0 siblings, 1 reply; 31+ messages in thread From: Luis Chamberlain @ 2025-03-18 23:17 UTC (permalink / raw) To: Matthew Wilcox Cc: Jan Kara, Oliver Sang, David Hildenbrand, Alistair Popple, linux-mm, Christian Brauner, Hannes Reinecke, oe-lkp, lkp, John Garry, linux-block, ltp, Pankaj Raghav, Daniel Gomez On Tue, Mar 18, 2025 at 02:37:29PM +0000, Matthew Wilcox wrote: > On Tue, Mar 18, 2025 at 01:15:33AM -0700, Luis Chamberlain wrote: > > I also can't see how the patch ("("block/bdev: enable large folio > > support for large logical block sizes") would trigger this. > > Easy enough to see by checking the backtrace. > > > [ 218.454517][ T51] folio_mc_copy+0xca/0x1f0 > > [ 218.454532][ T51] __migrate_folio+0x11a/0x2d0 > > [ 218.454541][ T51] __buffer_migrate_folio+0x558/0x660 > > folio_mc_copy() calls cond_resched() for large folios only. > __buffer_migrate_folio() calls spin_lock(&mapping->i_private_lock) > > so for folios without buffer heads attached, we never take the spinlock, > and for small folios we never call cond_resched(). It's only the > compaction path for large folios with buffer_heads attached that > calls cond_resched() while holding a spinlock. > > Jan was the one who extended the spinlock to be held over the copy > in ebdf4de5642f so adding him for thoughts. Ah, then that LTP test isn't going to easily reproduce bugs around compaction bug. To help proactively find compaction bugs more deterministically we wrote generic/750 and indeed we can easily see issues creep up with a SOAK_DURATION=9000 on ext4 on linux-next as of yesterday next-20250317. Mar 18 07:10:59 extra-ext4-defaults kernel: Linux version 6.14.0-rc7-next-20250317 (mcgrof@beef) (gcc (Debian 14.2.0-16) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44) #30 SMP PREEMPT_DYNAMIC Tue Mar 18 07:05:01 UTC 2025 Mar 18 07:10:59 extra-ext4-defaults kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-6.14.0-rc7-next-20250317 root=PARTUUID=503fa6f2-2d5b-4d7e-8cf8-3a811de326ce ro console=tty0 console=tty1 console=ttyS0,115200n8 console=ttyS0 Mar 18 07:10:59 extra-ext4-defaults kernel: BIOS-provided physical RAM map: <-- etc --> Mar 18 23:09:29 extra-ext4-defaults kernel: EXT4-fs (loop16): mounted filesystem dc4fc2d3-efb6-4c07-8e2d-e9cf1f9f9773 r/w with ordered data mode. Quota mode: none. Mar 18 23:09:32 extra-ext4-defaults kernel: EXT4-fs (loop5): mounted filesystem 08064f5c-03f9-4176-a738-ca5df9f258de r/w with ordered data mode. Quota mode: none. Mar 18 23:09:32 extra-ext4-defaults kernel: EXT4-fs (loop5): unmounting filesystem 08064f5c-03f9-4176-a738-ca5df9f258de. Mar 18 23:09:32 extra-ext4-defaults kernel: EXT4-fs (loop16): unmounting filesystem dc4fc2d3-efb6-4c07-8e2d-e9cf1f9f9773. Mar 18 23:09:32 extra-ext4-defaults kernel: EXT4-fs (loop16): mounted filesystem dc4fc2d3-efb6-4c07-8e2d-e9cf1f9f9773 r/w with ordered data mode. Quota mode: none. Mar 18 23:09:32 extra-ext4-defaults unknown: run fstests generic/750 at 2025-03-18 23:09:32 Mar 18 23:09:33 extra-ext4-defaults kernel: EXT4-fs (loop5): mounted filesystem bf5fcb06-8f03-4384-bd24-3a88418a08c3 r/w with ordered data mode. Quota mode: none. Mar 18 23:10:21 extra-ext4-defaults kernel: BUG: unable to handle page fault for address: ffff9d5640010c48 Mar 18 23:10:21 extra-ext4-defaults kernel: #PF: supervisor read access in kernel mode Mar 18 23:10:21 extra-ext4-defaults kernel: #PF: error_code(0x0000) - not-present page Mar 18 23:10:21 extra-ext4-defaults kernel: PGD 38601067 P4D 38601067 PUD 0 Mar 18 23:10:21 extra-ext4-defaults kernel: Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI Mar 18 23:10:21 extra-ext4-defaults kernel: CPU: 0 UID: 0 PID: 74 Comm: kcompactd0 Not tainted 6.14.0-rc7-next-20250317 #30 Mar 18 23:10:21 extra-ext4-defaults kernel: Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 2024.11-5 01/28/2025 Mar 18 23:10:21 extra-ext4-defaults kernel: RIP: 0010:__zone_watermark_ok+0x4e/0x1e0 Mar 18 23:10:21 extra-ext4-defaults kernel: Code: 00 00 00 41 f7 c0 38 02 00 00 0f 85 2c 01 00 00 48 8b 4f 30 48 63 d2 48 01 ca 85 db 0f 84 f3 00 00 00 49 29 d1 bb 80 00 00 00 <4c> 03 54 f7 38 31 d2 4d 39 ca 0f 8d d2 00 00 00 ba 01 00 00 00 85 Mar 18 23:10:21 extra-ext4-defaults kernel: RSP: 0018:ffffbf47c02b7c78 EFLAGS: 00010202 Mar 18 23:10:21 extra-ext4-defaults kernel: RAX: 0000000000000000 RBX: 0000000000000080 RCX: 0000000000000000 Mar 18 23:10:21 extra-ext4-defaults kernel: RDX: 0000000000000000 RSI: 0000000000002f52 RDI: ffff9d563fff9180 Mar 18 23:10:21 extra-ext4-defaults kernel: RBP: 0000000000000009 R08: 0000000000000080 R09: 00000000000030a1 Mar 18 23:10:21 extra-ext4-defaults kernel: R10: 0000000000000be4 R11: 0000000000000be4 R12: 0000000000000002 Mar 18 23:10:21 extra-ext4-defaults kernel: R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000002f52 Mar 18 23:10:21 extra-ext4-defaults kernel: FS: 0000000000000000(0000) GS:ffff9d56b6cce000(0000) knlGS:0000000000000000 Mar 18 23:10:21 extra-ext4-defaults kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Mar 18 23:10:21 extra-ext4-defaults kernel: CR2: ffff9d5640010c48 CR3: 0000000115920006 CR4: 0000000000772ef0 Mar 18 23:10:21 extra-ext4-defaults kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Mar 18 23:10:21 extra-ext4-defaults kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Mar 18 23:10:21 extra-ext4-defaults kernel: PKRU: 55555554 Mar 18 23:10:21 extra-ext4-defaults kernel: Call Trace: Mar 18 23:10:21 extra-ext4-defaults kernel: <TASK> Mar 18 23:10:21 extra-ext4-defaults kernel: ? __die_body.cold+0x19/0x28 Mar 18 23:10:21 extra-ext4-defaults kernel: ? page_fault_oops+0xa1/0x230 Mar 18 23:10:21 extra-ext4-defaults kernel: ? search_module_extables+0x40/0x60 Mar 18 23:10:21 extra-ext4-defaults kernel: ? __zone_watermark_ok+0x4e/0x1e0 Mar 18 23:10:21 extra-ext4-defaults kernel: ? search_bpf_extables+0x5b/0x80 Mar 18 23:10:21 extra-ext4-defaults kernel: ? exc_page_fault+0x16d/0x190 Mar 18 23:10:21 extra-ext4-defaults kernel: ? asm_exc_page_fault+0x22/0x30 Mar 18 23:10:21 extra-ext4-defaults kernel: ? __zone_watermark_ok+0x4e/0x1e0 Mar 18 23:10:21 extra-ext4-defaults kernel: ? hrtimer_try_to_cancel+0x78/0x110 Mar 18 23:10:21 extra-ext4-defaults kernel: compaction_suitable+0x4b/0xf0 Mar 18 23:10:21 extra-ext4-defaults kernel: compaction_suit_allocation_order+0x8f/0x110 Mar 18 23:10:21 extra-ext4-defaults kernel: kcompactd_do_work+0xbc/0x260 Mar 18 23:10:21 extra-ext4-defaults kernel: kcompactd+0x396/0x3e0 Mar 18 23:10:21 extra-ext4-defaults kernel: ? __pfx_autoremove_wake_function+0x10/0x10 Mar 18 23:10:21 extra-ext4-defaults kernel: ? __pfx_kcompactd+0x10/0x10 Mar 18 23:10:21 extra-ext4-defaults kernel: kthread+0xf6/0x240 Mar 18 23:10:21 extra-ext4-defaults kernel: ? __pfx_kthread+0x10/0x10 Mar 18 23:10:21 extra-ext4-defaults kernel: ? _raw_spin_unlock+0x15/0x30 Mar 18 23:10:21 extra-ext4-defaults kernel: ? finish_task_switch.isra.0+0x94/0x290 Mar 18 23:10:21 extra-ext4-defaults kernel: ? __pfx_kthread+0x10/0x10 Mar 18 23:10:21 extra-ext4-defaults kernel: ret_from_fork+0x2d/0x50 Mar 18 23:10:21 extra-ext4-defaults kernel: ? __pfx_kthread+0x10/0x10 Mar 18 23:10:21 extra-ext4-defaults kernel: ret_from_fork_asm+0x1a/0x30 Mar 18 23:10:21 extra-ext4-defaults kernel: </TASK> Mar 18 23:10:21 extra-ext4-defaults kernel: Modules linked in: exfat xfs ext2 loop sunrpc 9p nls_iso8859_1 nls_cp437 crc32c_generic vfat fat kvm_intel kvm ghash_clmulni_intel sha512_ssse3 sha512_generic sha256_ssse3 sha1_ssse3 aesni_intel gf128mul crypto_simd cryptd 9pnet_virtio virtio_console virtio_balloon button joydev evdev serio_raw nvme_fabrics dm_mod nvme_core drm vsock_loopback vmw_vsock_virtio_transport_common vsock nfnetlink autofs4 ext4 crc16 mbcache jbd2 btrfs blake2b_generic efivarfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq raid1 raid0 md_mod virtio_net net_failover failover virtio_blk psmouse virtio_pci virtio_pci_legacy_dev virtio_pci_modern_dev virtio virtio_ring Mar 18 23:10:21 extra-ext4-defaults kernel: CR2: ffff9d5640010c48 Mar 18 23:10:21 extra-ext4-defaults kernel: ---[ end trace 0000000000000000 ]--- Mar 18 23:10:21 extra-ext4-defaults kernel: RIP: 0010:__zone_watermark_ok+0x4e/0x1e0 Mar 18 23:10:21 extra-ext4-defaults kernel: Code: 00 00 00 41 f7 c0 38 02 00 00 0f 85 2c 01 00 00 48 8b 4f 30 48 63 d2 48 01 ca 85 db 0f 84 f3 00 00 00 49 29 d1 bb 80 00 00 00 <4c> 03 54 f7 38 31 d2 4d 39 ca 0f 8d d2 00 00 00 ba 01 00 00 00 85 Mar 18 23:10:21 extra-ext4-defaults kernel: RSP: 0018:ffffbf47c02b7c78 EFLAGS: 00010202 Mar 18 23:10:21 extra-ext4-defaults kernel: RAX: 0000000000000000 RBX: 0000000000000080 RCX: 0000000000000000 Mar 18 23:10:21 extra-ext4-defaults kernel: RDX: 0000000000000000 RSI: 0000000000002f52 RDI: ffff9d563fff9180 Mar 18 23:10:21 extra-ext4-defaults kernel: RBP: 0000000000000009 R08: 0000000000000080 R09: 00000000000030a1 Mar 18 23:10:21 extra-ext4-defaults kernel: R10: 0000000000000be4 R11: 0000000000000be4 R12: 0000000000000002 Mar 18 23:10:21 extra-ext4-defaults kernel: R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000002f52 Mar 18 23:10:21 extra-ext4-defaults kernel: FS: 0000000000000000(0000) GS:ffff9d56b6cce000(0000) knlGS:0000000000000000 Mar 18 23:10:21 extra-ext4-defaults kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Mar 18 23:10:21 extra-ext4-defaults kernel: CR2: ffff9d5640010c48 CR3: 0000000115920006 CR4: 0000000000772ef0 Mar 18 23:10:21 extra-ext4-defaults kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Mar 18 23:10:21 extra-ext4-defaults kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Mar 18 23:10:21 extra-ext4-defaults kernel: PKRU: 55555554 Mar 18 23:10:21 extra-ext4-defaults kernel: note: kcompactd0[74] exited with irqs disabled ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [linux-next:master] [block/bdev] 3c20917120: BUG:sleeping_function_called_from_invalid_context_at_mm/util.c 2025-03-18 23:17 ` Luis Chamberlain @ 2025-03-19 2:58 ` Matthew Wilcox 2025-03-19 16:55 ` Luis Chamberlain 0 siblings, 1 reply; 31+ messages in thread From: Matthew Wilcox @ 2025-03-19 2:58 UTC (permalink / raw) To: Luis Chamberlain Cc: Jan Kara, Oliver Sang, David Hildenbrand, Alistair Popple, linux-mm, Christian Brauner, Hannes Reinecke, oe-lkp, lkp, John Garry, linux-block, ltp, Pankaj Raghav, Daniel Gomez On Tue, Mar 18, 2025 at 04:17:54PM -0700, Luis Chamberlain wrote: > Ah, then that LTP test isn't going to easily reproduce bugs around > compaction bug. To help proactively find compaction bugs more > deterministically we wrote generic/750 and indeed we can easily see > issues creep up with a SOAK_DURATION=9000 on ext4 on linux-next as of > yesterday next-20250317. Umm .. this is an entirely separate bug. How much COMFIG_DEBUG do you have enabled (ie is this a consequence of something that we have an assert for, but you've disabled?) > BUG: unable to handle page fault for address: ffff9d5640010c48 > #PF: supervisor read access in kernel mode > #PF: error_code(0x0000) - not-present page > PGD 38601067 P4D 38601067 PUD 0 > Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI > CPU: 0 UID: 0 PID: 74 Comm: kcompactd0 Not tainted 6.14.0-rc7-next-20250317 #30 > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 2024.11-5 01/28/2025 > RIP: 0010:__zone_watermark_ok+0x4e/0x1e0 > Code: 00 00 00 41 f7 c0 38 02 00 00 0f 85 2c 01 00 00 48 8b 4f 30 48 63 d2 48 01 ca 85 db 0f 84 f3 00 00 00 49 29 d1 bb 80 00 00 00 <4c> 03 54 f7 38 31 d2 4d 39 ca 0f 8d d2 00 00 00 ba 01 00 00 00 85 > RSP: 0018:ffffbf47c02b7c78 EFLAGS: 00010202 > RAX: 0000000000000000 RBX: 0000000000000080 RCX: 0000000000000000 > RDX: 0000000000000000 RSI: 0000000000002f52 RDI: ffff9d563fff9180 > RBP: 0000000000000009 R08: 0000000000000080 R09: 00000000000030a1 > R10: 0000000000000be4 R11: 0000000000000be4 R12: 0000000000000002 > R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000002f52 2a:* 4c 03 54 f7 38 add 0x38(%rdi,%rsi,8),%r10 <-- trapping instruction Not quite sure what this is. Perhaps running this through decode_stacktrace.sh would be helpful? > FS: 0000000000000000(0000) GS:ffff9d56b6cce000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: ffff9d5640010c48 CR3: 0000000115920006 CR4: 0000000000772ef0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > PKRU: 55555554 > Call Trace: > <TASK> > ? __die_body.cold+0x19/0x28 > ? page_fault_oops+0xa1/0x230 > ? search_module_extables+0x40/0x60 > ? __zone_watermark_ok+0x4e/0x1e0 > ? search_bpf_extables+0x5b/0x80 > ? exc_page_fault+0x16d/0x190 > ? __zone_watermark_ok+0x4e/0x1e0 > ? hrtimer_try_to_cancel+0x78/0x110 > compaction_suit_allocation_order+0x8f/0x110 > kcompactd_do_work+0xbc/0x260 > kcompactd+0x396/0x3e0 > ? __pfx_autoremove_wake_function+0x10/0x10 > ? __pfx_kcompactd+0x10/0x10 > kthread+0xf6/0x240 > ? __pfx_kthread+0x10/0x10 > ? _raw_spin_unlock+0x15/0x30 > ? finish_task_switch.isra.0+0x94/0x290 > ? __pfx_kthread+0x10/0x10 > ret_from_fork+0x2d/0x50 > ? __pfx_kthread+0x10/0x10 > ret_from_fork_asm+0x1a/0x30 > </TASK> > Modules linked in: exfat xfs ext2 loop sunrpc 9p nls_iso8859_1 nls_cp437 crc32c_generic vfat fat kvm_intel kvm ghash_clmulni_intel sha512_ssse3 sha512_generic sha256_ssse3 sha1_ssse3 aesni_intel gf128mul crypto_simd cryptd 9pnet_virtio virtio_console virtio_balloon button joydev evdev serio_raw nvme_fabrics dm_mod nvme_core drm vsock_loopback vmw_vsock_virtio_transport_common vsock nfnetlink autofs4 ext4 crc16 mbcache jbd2 btrfs blake2b_generic efivarfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq raid1 raid0 md_mod virtio_net net_failover failover virtio_blk psmouse virtio_pci virtio_pci_legacy_dev virtio_pci_modern_dev virtio virtio_ring > CR2: ffff9d5640010c48 > ---[ end trace 0000000000000000 ]--- > RIP: 0010:__zone_watermark_ok+0x4e/0x1e0 > Code: 00 00 00 41 f7 c0 38 02 00 00 0f 85 2c 01 00 00 48 8b 4f 30 48 63 d2 48 01 ca 85 db 0f 84 f3 00 00 00 49 29 d1 bb 80 00 00 00 <4c> 03 54 f7 38 31 d2 4d 39 ca 0f 8d d2 00 00 00 ba 01 00 00 00 85 > RSP: 0018:ffffbf47c02b7c78 EFLAGS: 00010202 > RAX: 0000000000000000 RBX: 0000000000000080 RCX: 0000000000000000 > RDX: 0000000000000000 RSI: 0000000000002f52 RDI: ffff9d563fff9180 > RBP: 0000000000000009 R08: 0000000000000080 R09: 00000000000030a1 > R10: 0000000000000be4 R11: 0000000000000be4 R12: 0000000000000002 > R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000002f52 > FS: 0000000000000000(0000) GS:ffff9d56b6cce000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: ffff9d5640010c48 CR3: 0000000115920006 CR4: 0000000000772ef0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > PKRU: 55555554 > note: kcompactd0[74] exited with irqs disabled ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [linux-next:master] [block/bdev] 3c20917120: BUG:sleeping_function_called_from_invalid_context_at_mm/util.c 2025-03-19 2:58 ` Matthew Wilcox @ 2025-03-19 16:55 ` Luis Chamberlain 2025-03-19 19:16 ` Luis Chamberlain 0 siblings, 1 reply; 31+ messages in thread From: Luis Chamberlain @ 2025-03-19 16:55 UTC (permalink / raw) To: Matthew Wilcox Cc: Jan Kara, Oliver Sang, David Hildenbrand, Alistair Popple, linux-mm, Christian Brauner, Hannes Reinecke, oe-lkp, lkp, John Garry, linux-block, ltp, Pankaj Raghav, Daniel Gomez, David Bueso On Wed, Mar 19, 2025 at 02:58:38AM +0000, Matthew Wilcox wrote: > On Tue, Mar 18, 2025 at 04:17:54PM -0700, Luis Chamberlain wrote: > > Ah, then that LTP test isn't going to easily reproduce bugs around > > compaction bug. To help proactively find compaction bugs more > > deterministically we wrote generic/750 and indeed we can easily see > > issues creep up with a SOAK_DURATION=9000 on ext4 on linux-next as of > > yesterday next-20250317. > > Umm .. this is an entirely separate bug. How much COMFIG_DEBUG do you > have enabled (ie is this a consequence of something that we have an > assert for, but you've disabled?) grep ^CONFIG_DEBUG .config CONFIG_DEBUG_BUGVERBOSE=y CONFIG_DEBUG_KERNEL=y CONFIG_DEBUG_MISC=y CONFIG_DEBUG_INFO=y CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y CONFIG_DEBUG_INFO_COMPRESSED_NONE=y CONFIG_DEBUG_FS=y CONFIG_DEBUG_FS_ALLOW_ALL=y CONFIG_DEBUG_WX=y CONFIG_DEBUG_KMEMLEAK=y CONFIG_DEBUG_KMEMLEAK_MEM_POOL_SIZE=16000 CONFIG_DEBUG_VM_IRQSOFF=y CONFIG_DEBUG_VM=y CONFIG_DEBUG_VM_PGFLAGS=y CONFIG_DEBUG_MEMORY_INIT=y CONFIG_DEBUG_PREEMPT=y CONFIG_DEBUG_LIST=y CONFIG_DEBUG_MAPLE_TREE=y Let me know if you want me to enable some other ones, these are always enabled on any kdevops reportings. > Not quite sure what this is. Perhaps running this through decode_stacktrace.sh > would be helpful? Sure here is a fresh splat on next-20250317. What can be seen here is that the issue can be easily reproduced within just one minute of the test running. FWIW, I'm not seeing this crash or any kernel splat within the same time (I'll let this run the full 2.5 hours now to verify) on vanilla 6.14.0-rc3 + the 64k-sector-size patches, which would explain why I hadn't seen this in my earlier testing over 10 ext4 profiles on fstests. This particular crash seems likely to be an artifact on the development cycle on next-20250317. Mar 19 16:20:41 extra-ext4-defaults kernel: Linux version 6.14.0-rc7-next-20250317 (mcgrof@beef) (gcc (Debian 14.2.0-16) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44) #32 SMP PREEMPT_DYNAMIC Wed Mar 19 16:18:39 UTC 2025 Mar 19 16:20:41 extra-ext4-defaults kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-6.14.0-rc7-next-20250317 root=PARTUUID=503fa6f2-2d5b-4d7e-8cf8-3a811de326ce ro console=tty0 console=tty1 console=ttyS0,115200n8 console=ttyS0 < etc > Mar 19 16:21:23 extra-ext4-defaults kernel: EXT4-fs (loop16): mounted filesystem 200cf81b-dd0f-4614-8c4b-6f4af34aa9ff r/w with ordered data mode. Quota mode: none. Mar 19 16:21:29 extra-ext4-defaults kernel: EXT4-fs (loop5): mounted filesystem cd905b7c-532b-4244-96b7-d2b393f3b16e r/w with ordered data mode. Quota mode: none. Mar 19 16:21:29 extra-ext4-defaults kernel: EXT4-fs (loop5): unmounting filesystem cd905b7c-532b-4244-96b7-d2b393f3b16e. Mar 19 16:21:29 extra-ext4-defaults kernel: EXT4-fs (loop16): unmounting filesystem 200cf81b-dd0f-4614-8c4b-6f4af34aa9ff. Mar 19 16:21:29 extra-ext4-defaults kernel: EXT4-fs (loop16): mounted filesystem 200cf81b-dd0f-4614-8c4b-6f4af34aa9ff r/w with ordered data mode. Quota mode: none. Mar 19 16:21:29 extra-ext4-defaults unknown: run fstests generic/750 at 2025-03-19 16:21:29 Mar 19 16:21:30 extra-ext4-defaults kernel: EXT4-fs (loop5): mounted filesystem f7af9558-57b0-4266-8326-a1bdda0be33a r/w with ordered data mode. Quota mode: none. Mar 19 16:22:28 extra-ext4-defaults kernel: BUG: unable to handle page fault for address: ffff8f0e00013350 Mar 19 16:22:28 extra-ext4-defaults kernel: #PF: supervisor read access in kernel mode Mar 19 16:22:28 extra-ext4-defaults kernel: #PF: error_code(0x0000) - not-present page Mar 19 16:22:28 extra-ext4-defaults kernel: PGD 158401067 P4D 158401067 PUD 0 Mar 19 16:22:28 extra-ext4-defaults kernel: Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI Mar 19 16:22:28 extra-ext4-defaults kernel: CPU: 2 UID: 0 PID: 74 Comm: kcompactd0 Not tainted 6.14.0-rc7-next-20250317 #32 Mar 19 16:22:28 extra-ext4-defaults kernel: Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 2024.11-5 01/28/2025 Mar 19 16:22:28 extra-ext4-defaults kernel: RIP: 0010:__zone_watermark_ok (mm/page_alloc.c:3339) Mar 19 16:22:28 extra-ext4-defaults kernel: Code: 00 00 00 41 f7 c0 38 02 00 00 0f 85 2c 01 00 00 48 8b 4f 30 48 63 d2 48 01 ca 85 db 0f 84 f3 00 00 00 49 29 d1 bb 80 00 00 00 <4c> 03 54 f7 38 31 d2 4d 39 ca 0f 8d d2 00 00 00 ba 01 00 00 00 85 All code ======== 0: 00 00 add %al,(%rax) 2: 00 41 f7 add %al,-0x9(%rcx) 5: c0 38 02 sarb $0x2,(%rax) 8: 00 00 add %al,(%rax) a: 0f 85 2c 01 00 00 jne 0x13c 10: 48 8b 4f 30 mov 0x30(%rdi),%rcx 14: 48 63 d2 movslq %edx,%rdx 17: 48 01 ca add %rcx,%rdx 1a: 85 db test %ebx,%ebx 1c: 0f 84 f3 00 00 00 je 0x115 22: 49 29 d1 sub %rdx,%r9 25: bb 80 00 00 00 mov $0x80,%ebx 2a:* 4c 03 54 f7 38 add 0x38(%rdi,%rsi,8),%r10 <-- trapping instruction 2f: 31 d2 xor %edx,%edx 31: 4d 39 ca cmp %r9,%r10 34: 0f 8d d2 00 00 00 jge 0x10c 3a: ba 01 00 00 00 mov $0x1,%edx 3f: 85 .byte 0x85 Code starting with the faulting instruction =========================================== 0: 4c 03 54 f7 38 add 0x38(%rdi,%rsi,8),%r10 5: 31 d2 xor %edx,%edx 7: 4d 39 ca cmp %r9,%r10 a: 0f 8d d2 00 00 00 jge 0xe2 10: ba 01 00 00 00 mov $0x1,%edx 15: 85 .byte 0x85 Mar 19 16:22:28 extra-ext4-defaults kernel: RSP: 0018:ffffa3ed002b7c78 EFLAGS: 00010202 Mar 19 16:22:28 extra-ext4-defaults kernel: RAX: 0000000000000000 RBX: 0000000000000080 RCX: 0000000000000000 Mar 19 16:22:28 extra-ext4-defaults kernel: RDX: 0000000000000000 RSI: 0000000000003033 RDI: ffff8f0dffffb180 Mar 19 16:22:28 extra-ext4-defaults kernel: RBP: 0000000000000009 R08: 0000000000000080 R09: 0000000000002ffb Mar 19 16:22:28 extra-ext4-defaults kernel: R10: 0000000000000c09 R11: 0000000000000c09 R12: 0000000000000002 Mar 19 16:22:28 extra-ext4-defaults kernel: R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000003033 Mar 19 16:22:28 extra-ext4-defaults kernel: FS: 0000000000000000(0000) GS:ffff8f0e72f4e000(0000) knlGS:0000000000000000 Mar 19 16:22:28 extra-ext4-defaults kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Mar 19 16:22:28 extra-ext4-defaults kernel: CR2: ffff8f0e00013350 CR3: 0000000116942002 CR4: 0000000000772ef0 Mar 19 16:22:28 extra-ext4-defaults kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Mar 19 16:22:28 extra-ext4-defaults kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Mar 19 16:22:28 extra-ext4-defaults kernel: PKRU: 55555554 Mar 19 16:22:28 extra-ext4-defaults kernel: Call Trace: Mar 19 16:22:28 extra-ext4-defaults kernel: <TASK> Mar 19 16:22:28 extra-ext4-defaults kernel: ? __die_body.cold (arch/x86/kernel/dumpstack.c:478 (discriminator 1) arch/x86/kernel/dumpstack.c:465 (discriminator 1) arch/x86/kernel/dumpstack.c:420 (discriminator 1)) Mar 19 16:22:28 extra-ext4-defaults kernel: ? page_fault_oops (arch/x86/mm/fault.c:710 (discriminator 1)) Mar 19 16:22:28 extra-ext4-defaults kernel: ? search_module_extables (kernel/module/main.c:3687) Mar 19 16:22:28 extra-ext4-defaults kernel: ? __zone_watermark_ok (mm/page_alloc.c:3339) Mar 19 16:22:28 extra-ext4-defaults kernel: ? search_bpf_extables (kernel/bpf/core.c:804) Mar 19 16:22:28 extra-ext4-defaults kernel: ? exc_page_fault (arch/x86/mm/fault.c:1182 (discriminator 1) arch/x86/mm/fault.c:1478 (discriminator 1) arch/x86/mm/fault.c:1538 (discriminator 1)) Mar 19 16:22:28 extra-ext4-defaults kernel: ? asm_exc_page_fault (./arch/x86/include/asm/idtentry.h:574) Mar 19 16:22:28 extra-ext4-defaults kernel: ? __zone_watermark_ok (mm/page_alloc.c:3339) Mar 19 16:22:28 extra-ext4-defaults kernel: compaction_suitable (mm/compaction.c:2454) Mar 19 16:22:28 extra-ext4-defaults kernel: compaction_suit_allocation_order (mm/compaction.c:2547) Mar 19 16:22:28 extra-ext4-defaults kernel: kcompactd_do_work (mm/compaction.c:3129) Mar 19 16:22:28 extra-ext4-defaults kernel: kcompactd (mm/compaction.c:3243) Mar 19 16:22:28 extra-ext4-defaults kernel: ? __pfx_autoremove_wake_function (kernel/sched/wait.c:383) Mar 19 16:22:28 extra-ext4-defaults kernel: ? __pfx_kcompactd (mm/compaction.c:3207) Mar 19 16:22:28 extra-ext4-defaults kernel: kthread (kernel/kthread.c:464) Mar 19 16:22:28 extra-ext4-defaults kernel: ? __pfx_kthread (kernel/kthread.c:413) Mar 19 16:22:28 extra-ext4-defaults kernel: ? _raw_spin_unlock (./include/linux/spinlock_api_smp.h:143 (discriminator 3) kernel/locking/spinlock.c:186 (discriminator 3)) Mar 19 16:22:28 extra-ext4-defaults kernel: ? finish_task_switch.isra.0 (./arch/x86/include/asm/paravirt.h:686 kernel/sched/sched.h:1533 kernel/sched/core.c:5125 kernel/sched/core.c:5243) Mar 19 16:22:28 extra-ext4-defaults kernel: ? __pfx_kthread (kernel/kthread.c:413) Mar 19 16:22:28 extra-ext4-defaults kernel: ret_from_fork (arch/x86/kernel/process.c:153) Mar 19 16:22:28 extra-ext4-defaults kernel: ? __pfx_kthread (kernel/kthread.c:413) Mar 19 16:22:28 extra-ext4-defaults kernel: ret_from_fork_asm (arch/x86/entry/entry_64.S:258) Mar 19 16:22:28 extra-ext4-defaults kernel: </TASK> Mar 19 16:22:28 extra-ext4-defaults kernel: Modules linked in: loop sunrpc 9p nls_iso8859_1 nls_cp437 crc32c_generic vfat fat kvm_intel kvm ghash_clmulni_intel sha512_ssse3 sha512_generic sha256_ssse3 sha1_ssse3 aesni_intel gf128mul crypto_simd virtio_balloon cryptd 9pnet_virtio virtio_console joydev evdev button serio_raw nvme_fabrics dm_mod nvme_core drm nfnetlink vsock_loopback vmw_vsock_virtio_transport_common vsock autofs4 ext4 crc16 mbcache jbd2 btrfs blake2b_generic efivarfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq raid1 raid0 md_mod virtio_net net_failover failover virtio_blk virtio_pci psmouse virtio_pci_legacy_dev virtio_pci_modern_dev virtio virtio_ring Mar 19 16:22:28 extra-ext4-defaults kernel: CR2: ffff8f0e00013350 Mar 19 16:22:28 extra-ext4-defaults kernel: ---[ end trace 0000000000000000 ]--- Mar 19 16:22:28 extra-ext4-defaults kernel: RIP: 0010:__zone_watermark_ok (mm/page_alloc.c:3339) Mar 19 16:22:28 extra-ext4-defaults kernel: Code: 00 00 00 41 f7 c0 38 02 00 00 0f 85 2c 01 00 00 48 8b 4f 30 48 63 d2 48 01 ca 85 db 0f 84 f3 00 00 00 49 29 d1 bb 80 00 00 00 <4c> 03 54 f7 38 31 d2 4d 39 ca 0f 8d d2 00 00 00 ba 01 00 00 00 85 All code ======== 0: 00 00 add %al,(%rax) 2: 00 41 f7 add %al,-0x9(%rcx) 5: c0 38 02 sarb $0x2,(%rax) 8: 00 00 add %al,(%rax) a: 0f 85 2c 01 00 00 jne 0x13c 10: 48 8b 4f 30 mov 0x30(%rdi),%rcx 14: 48 63 d2 movslq %edx,%rdx 17: 48 01 ca add %rcx,%rdx 1a: 85 db test %ebx,%ebx 1c: 0f 84 f3 00 00 00 je 0x115 22: 49 29 d1 sub %rdx,%r9 25: bb 80 00 00 00 mov $0x80,%ebx 2a:* 4c 03 54 f7 38 add 0x38(%rdi,%rsi,8),%r10 <-- trapping instruction 2f: 31 d2 xor %edx,%edx 31: 4d 39 ca cmp %r9,%r10 34: 0f 8d d2 00 00 00 jge 0x10c 3a: ba 01 00 00 00 mov $0x1,%edx 3f: 85 .byte 0x85 Code starting with the faulting instruction =========================================== 0: 4c 03 54 f7 38 add 0x38(%rdi,%rsi,8),%r10 5: 31 d2 xor %edx,%edx 7: 4d 39 ca cmp %r9,%r10 a: 0f 8d d2 00 00 00 jge 0xe2 10: ba 01 00 00 00 mov $0x1,%edx 15: 85 .byte 0x85 Mar 19 16:22:28 extra-ext4-defaults kernel: RSP: 0018:ffffa3ed002b7c78 EFLAGS: 00010202 Mar 19 16:22:28 extra-ext4-defaults kernel: RAX: 0000000000000000 RBX: 0000000000000080 RCX: 0000000000000000 Mar 19 16:22:28 extra-ext4-defaults kernel: RDX: 0000000000000000 RSI: 0000000000003033 RDI: ffff8f0dffffb180 Mar 19 16:22:28 extra-ext4-defaults kernel: RBP: 0000000000000009 R08: 0000000000000080 R09: 0000000000002ffb Mar 19 16:22:28 extra-ext4-defaults kernel: R10: 0000000000000c09 R11: 0000000000000c09 R12: 0000000000000002 Mar 19 16:22:28 extra-ext4-defaults kernel: R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000003033 Mar 19 16:22:28 extra-ext4-defaults kernel: FS: 0000000000000000(0000) GS:ffff8f0e72f4e000(0000) knlGS:0000000000000000 Mar 19 16:22:28 extra-ext4-defaults kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Mar 19 16:22:28 extra-ext4-defaults kernel: CR2: ffff8f0e00013350 CR3: 0000000116942002 CR4: 0000000000772ef0 Mar 19 16:22:28 extra-ext4-defaults kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Mar 19 16:22:28 extra-ext4-defaults kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Mar 19 16:22:28 extra-ext4-defaults kernel: PKRU: 55555554 Mar 19 16:22:28 extra-ext4-defaults kernel: note: kcompactd0[74] exited with irqs disabled ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [linux-next:master] [block/bdev] 3c20917120: BUG:sleeping_function_called_from_invalid_context_at_mm/util.c 2025-03-19 16:55 ` Luis Chamberlain @ 2025-03-19 19:16 ` Luis Chamberlain 2025-03-19 19:24 ` Matthew Wilcox 0 siblings, 1 reply; 31+ messages in thread From: Luis Chamberlain @ 2025-03-19 19:16 UTC (permalink / raw) To: Matthew Wilcox Cc: Jan Kara, Oliver Sang, David Hildenbrand, Alistair Popple, linux-mm, Christian Brauner, Hannes Reinecke, oe-lkp, lkp, John Garry, linux-block, ltp, Pankaj Raghav, Daniel Gomez, David Bueso On Wed, Mar 19, 2025 at 09:55:11AM -0700, Luis Chamberlain wrote: > FWIW, I'm not seeing this crash or any kernel splat within the > same time (I'll let this run the full 2.5 hours now to verify) on > vanilla 6.14.0-rc3 + the 64k-sector-size patches, which would explain why I > hadn't seen this in my earlier testing over 10 ext4 profiles on fstests. This > particular crash seems likely to be an artifact on the development cycle on > next-20250317. I confirm that with a vanilla 6.14.0-rc3 + the 64k-sector-size patches a 2.5 hour run generic/750 doesn't crash at all. So indeed something on the development cycle leads to this particular crash. Luis ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [linux-next:master] [block/bdev] 3c20917120: BUG:sleeping_function_called_from_invalid_context_at_mm/util.c 2025-03-19 19:16 ` Luis Chamberlain @ 2025-03-19 19:24 ` Matthew Wilcox 2025-03-20 12:11 ` Luis Chamberlain 0 siblings, 1 reply; 31+ messages in thread From: Matthew Wilcox @ 2025-03-19 19:24 UTC (permalink / raw) To: Luis Chamberlain Cc: Jan Kara, Oliver Sang, David Hildenbrand, Alistair Popple, linux-mm, Christian Brauner, Hannes Reinecke, oe-lkp, lkp, John Garry, linux-block, ltp, Pankaj Raghav, Daniel Gomez, David Bueso On Wed, Mar 19, 2025 at 12:16:41PM -0700, Luis Chamberlain wrote: > On Wed, Mar 19, 2025 at 09:55:11AM -0700, Luis Chamberlain wrote: > > FWIW, I'm not seeing this crash or any kernel splat within the > > same time (I'll let this run the full 2.5 hours now to verify) on > > vanilla 6.14.0-rc3 + the 64k-sector-size patches, which would explain why I > > hadn't seen this in my earlier testing over 10 ext4 profiles on fstests. This > > particular crash seems likely to be an artifact on the development cycle on > > next-20250317. > > I confirm that with a vanilla 6.14.0-rc3 + the 64k-sector-size patches a 2.5 > hour run generic/750 doesn't crash at all. So indeed something on the > development cycle leads to this particular crash. We can't debug two problems at once. FOr the first problem, I've demonstrated what the cause is, and that's definitely introduced by your patch, so we need to figure out a solution. For the second problem, we don't know what it is. Do you want to bisect it to figure out which commit introduced it? ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [linux-next:master] [block/bdev] 3c20917120: BUG:sleeping_function_called_from_invalid_context_at_mm/util.c 2025-03-19 19:24 ` Matthew Wilcox @ 2025-03-20 12:11 ` Luis Chamberlain 2025-03-20 12:18 ` Luis Chamberlain 2025-03-22 23:14 ` Johannes Weiner 0 siblings, 2 replies; 31+ messages in thread From: Luis Chamberlain @ 2025-03-20 12:11 UTC (permalink / raw) To: Matthew Wilcox, Johannes Weiner Cc: Jan Kara, Oliver Sang, David Hildenbrand, Alistair Popple, linux-mm, Christian Brauner, Hannes Reinecke, oe-lkp, lkp, John Garry, linux-block, ltp, Pankaj Raghav, Daniel Gomez, David Bueso On Wed, Mar 19, 2025 at 07:24:23PM +0000, Matthew Wilcox wrote: > On Wed, Mar 19, 2025 at 12:16:41PM -0700, Luis Chamberlain wrote: > > On Wed, Mar 19, 2025 at 09:55:11AM -0700, Luis Chamberlain wrote: > > > FWIW, I'm not seeing this crash or any kernel splat within the > > > same time (I'll let this run the full 2.5 hours now to verify) on > > > vanilla 6.14.0-rc3 + the 64k-sector-size patches, which would explain why I > > > hadn't seen this in my earlier testing over 10 ext4 profiles on fstests. This > > > particular crash seems likely to be an artifact on the development cycle on > > > next-20250317. > > > > I confirm that with a vanilla 6.14.0-rc3 + the 64k-sector-size patches a 2.5 > > hour run generic/750 doesn't crash at all. So indeed something on the > > development cycle leads to this particular crash. > > We can't debug two problems at once. > > FOr the first problem, I've demonstrated what the cause is, and that's > definitely introduced by your patch, so we need to figure out a > solution. Sure, yeah I followed that. > For the second problem, we don't know what it is. Do you want to bisect > it to figure out which commit introduced it? Sure, the culprit is the patch titled: mm: page_alloc: trace type pollution from compaction capturing Johannes, any ideas? You can reproduce easily (1-2 minutes) by running fstests against ext4 with a 4k block size filesystem on linux-next against the test generic/750. Below is the splat decoded. Mar 20 11:52:55 extra-ext4-4k kernel: Linux version 6.14.0-rc6+ (mcgrof@beefy) (gcc (Debian 14.2.0-16) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44) #51 SMP PREEMPT_DYNAMIC Thu Mar 20 11:50:32 UTC 2025 Mar 20 11:52:55 extra-ext4-4k kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-6.14.0-rc6+ root=PARTUUID=503fa6f2-2d5b-4d7e-8cf8-3a811de326ce ro console=tty0 console=tty1 console=ttyS0,115200n8 console=ttyS0 < -- etc --> Mar 20 11:55:27 extra-ext4-4k unknown: run fstests generic/750 at 2025-03-20 11:55:27 Mar 20 11:55:28 extra-ext4-4k kernel: EXT4-fs (loop5): mounted filesystem c20cbdee-a370-4743-80aa-95dec0beaaa2 r/w with ordered data mode. Quota mode: none. Mar 20 11:56:29 extra-ext4-4k kernel: BUG: unable to handle page fault for address: ffff93098000ba00 Mar 20 11:56:29 extra-ext4-4k kernel: #PF: supervisor read access in kernel mode Mar 20 11:56:29 extra-ext4-4k kernel: #PF: error_code(0x0000) - not-present page Mar 20 11:56:29 extra-ext4-4k kernel: PGD 3a201067 P4D 3a201067 PUD 0 Mar 20 11:56:29 extra-ext4-4k kernel: Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI Mar 20 11:56:29 extra-ext4-4k kernel: CPU: 0 UID: 0 PID: 74 Comm: kcompactd0 Not tainted 6.14.0-rc6+ #51 Mar 20 11:56:29 extra-ext4-4k kernel: Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 2024.11-5 01/28/2025 Mar 20 11:56:29 extra-ext4-4k kernel: RIP: 0010:__zone_watermark_ok (mm/page_alloc.c:3256) Mar 20 11:56:29 extra-ext4-4k kernel: Code: 00 00 00 41 f7 c0 38 02 00 00 0f 85 2c 01 00 00 48 8b 4f 30 48 63 d2 48 01 ca 85 db 0f 84 f3 00 00 00 49 29 d1 bb 80 00 00 00 <4c> 03 54 f7 38 31 d2 4d 39 ca 0f 8d d2 00 00 00 ba 01 00 00 00 85 All code ======== 0: 00 00 add %al,(%rax) 2: 00 41 f7 add %al,-0x9(%rcx) 5: c0 38 02 sarb $0x2,(%rax) 8: 00 00 add %al,(%rax) a: 0f 85 2c 01 00 00 jne 0x13c 10: 48 8b 4f 30 mov 0x30(%rdi),%rcx 14: 48 63 d2 movslq %edx,%rdx 17: 48 01 ca add %rcx,%rdx 1a: 85 db test %ebx,%ebx 1c: 0f 84 f3 00 00 00 je 0x115 22: 49 29 d1 sub %rdx,%r9 25: bb 80 00 00 00 mov $0x80,%ebx 2a:* 4c 03 54 f7 38 add 0x38(%rdi,%rsi,8),%r10 <-- trapping instruction 2f: 31 d2 xor %edx,%edx 31: 4d 39 ca cmp %r9,%r10 34: 0f 8d d2 00 00 00 jge 0x10c 3a: ba 01 00 00 00 mov $0x1,%edx 3f: 85 .byte 0x85 Code starting with the faulting instruction =========================================== 0: 4c 03 54 f7 38 add 0x38(%rdi,%rsi,8),%r10 5: 31 d2 xor %edx,%edx 7: 4d 39 ca cmp %r9,%r10 a: 0f 8d d2 00 00 00 jge 0xe2 10: ba 01 00 00 00 mov $0x1,%edx 15: 85 .byte 0x85 Mar 20 11:56:29 extra-ext4-4k kernel: RSP: 0018:ffffa5bb002b7c78 EFLAGS: 00010206 Mar 20 11:56:29 extra-ext4-4k kernel: RAX: 0000000000000000 RBX: 0000000000000080 RCX: 0000000000000000 Mar 20 11:56:29 extra-ext4-4k kernel: RDX: 0000000000000000 RSI: 0000000000002431 RDI: ffff93097fff9840 Mar 20 11:56:29 extra-ext4-4k kernel: RBP: 0000000000000009 R08: 0000000000000080 R09: 0000000000005e90 Mar 20 11:56:29 extra-ext4-4k kernel: R10: 0000000000000c8e R11: 0000000000000c8e R12: 0000000000000000 Mar 20 11:56:29 extra-ext4-4k kernel: R13: 0000000000002431 R14: 0000000000000002 R15: 0000000000000000 Mar 20 11:56:29 extra-ext4-4k kernel: FS: 0000000000000000(0000) GS:ffff93097bc00000(0000) knlGS:0000000000000000 Mar 20 11:56:29 extra-ext4-4k kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Mar 20 11:56:29 extra-ext4-4k kernel: CR2: ffff93098000ba00 CR3: 000000010c602004 CR4: 0000000000772ef0 Mar 20 11:56:29 extra-ext4-4k kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Mar 20 11:56:29 extra-ext4-4k kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Mar 20 11:56:29 extra-ext4-4k kernel: PKRU: 55555554 Mar 20 11:56:29 extra-ext4-4k kernel: Call Trace: Mar 20 11:56:29 extra-ext4-4k kernel: <TASK> Mar 20 11:56:29 extra-ext4-4k kernel: ? __die_body.cold (arch/x86/kernel/dumpstack.c:478 (discriminator 1) arch/x86/kernel/dumpstack.c:465 (discriminator 1) arch/x86/kernel/dumpstack.c:420 (discriminator 1)) Mar 20 11:56:29 extra-ext4-4k kernel: ? page_fault_oops (arch/x86/mm/fault.c:710 (discriminator 1)) Mar 20 11:56:29 extra-ext4-4k kernel: ? search_module_extables (kernel/module/main.c:3733 (discriminator 3)) Mar 20 11:56:29 extra-ext4-4k kernel: ? __zone_watermark_ok (mm/page_alloc.c:3256) Mar 20 11:56:29 extra-ext4-4k kernel: ? search_bpf_extables (kernel/bpf/core.c:804) Mar 20 11:56:29 extra-ext4-4k kernel: ? exc_page_fault (arch/x86/mm/fault.c:1182 (discriminator 1) arch/x86/mm/fault.c:1478 (discriminator 1) arch/x86/mm/fault.c:1538 (discriminator 1)) Mar 20 11:56:29 extra-ext4-4k kernel: ? asm_exc_page_fault (./arch/x86/include/asm/idtentry.h:574) Mar 20 11:56:29 extra-ext4-4k kernel: ? __zone_watermark_ok (mm/page_alloc.c:3256) Mar 20 11:56:29 extra-ext4-4k kernel: ? asm_sysvec_call_function_single (./arch/x86/include/asm/idtentry.h:574) Mar 20 11:56:29 extra-ext4-4k kernel: compaction_suitable (mm/compaction.c:2438) Mar 20 11:56:29 extra-ext4-4k kernel: compaction_suit_allocation_order (mm/compaction.c:2525 (discriminator 1)) Mar 20 11:56:29 extra-ext4-4k kernel: kcompactd_do_work (mm/compaction.c:3106) Mar 20 11:56:29 extra-ext4-4k kernel: kcompactd (mm/compaction.c:3220) Mar 20 11:56:29 extra-ext4-4k kernel: ? __pfx_autoremove_wake_function (kernel/sched/wait.c:383) Mar 20 11:56:29 extra-ext4-4k kernel: ? __pfx_kcompactd (mm/compaction.c:3184) Mar 20 11:56:29 extra-ext4-4k kernel: kthread (kernel/kthread.c:464) Mar 20 11:56:29 extra-ext4-4k kernel: ? __pfx_kthread (kernel/kthread.c:413) Mar 20 11:56:29 extra-ext4-4k kernel: ? _raw_spin_unlock (./include/linux/spinlock_api_smp.h:143 (discriminator 3) kernel/locking/spinlock.c:186 (discriminator 3)) Mar 20 11:56:29 extra-ext4-4k kernel: ? finish_task_switch.isra.0 (./arch/x86/include/asm/paravirt.h:691 kernel/sched/sched.h:1533 kernel/sched/core.c:5132 kernel/sched/core.c:5250) Mar 20 11:56:29 extra-ext4-4k kernel: ? __pfx_kthread (kernel/kthread.c:413) Mar 20 11:56:29 extra-ext4-4k kernel: ret_from_fork (arch/x86/kernel/process.c:148) Mar 20 11:56:29 extra-ext4-4k kernel: ? __pfx_kthread (kernel/kthread.c:413) Mar 20 11:56:29 extra-ext4-4k kernel: ret_from_fork_asm (arch/x86/entry/entry_64.S:257) Mar 20 11:56:29 extra-ext4-4k kernel: </TASK> Mar 20 11:56:29 extra-ext4-4k kernel: Modules linked in: loop sunrpc 9p nls_iso8859_1 nls_cp437 vfat crc32c_generic fat kvm_intel kvm ghash_clmulni_intel sha512_ssse3 sha512_generic sha256_ssse3 sha1_ssse3 aesni_intel gf128mul crypto_simd 9pnet_virtio cryptd virtio_console virtio_balloon button evdev joydev serio_raw dm_mod nvme_fabrics drm nvme_core nfnetlink vsock_loopback vmw_vsock_virtio_transport_common vsock autofs4 ext4 crc16 mbcache jbd2 btrfs blake2b_generic efivarfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq raid1 raid0 md_mod virtio_net net_failover failover virtio_blk psmouse virtio_pci virtio_pci_legacy_dev virtio_pci_modern_dev virtio virtio_ring Mar 20 11:56:29 extra-ext4-4k kernel: CR2: ffff93098000ba00 Mar 20 11:56:29 extra-ext4-4k kernel: ---[ end trace 0000000000000000 ]--- Mar 20 11:56:29 extra-ext4-4k kernel: RIP: 0010:__zone_watermark_ok (mm/page_alloc.c:3256) Mar 20 11:56:29 extra-ext4-4k kernel: Code: 00 00 00 41 f7 c0 38 02 00 00 0f 85 2c 01 00 00 48 8b 4f 30 48 63 d2 48 01 ca 85 db 0f 84 f3 00 00 00 49 29 d1 bb 80 00 00 00 <4c> 03 54 f7 38 31 d2 4d 39 ca 0f 8d d2 00 00 00 ba 01 00 00 00 85 All code ======== 0: 00 00 add %al,(%rax) 2: 00 41 f7 add %al,-0x9(%rcx) 5: c0 38 02 sarb $0x2,(%rax) 8: 00 00 add %al,(%rax) a: 0f 85 2c 01 00 00 jne 0x13c 10: 48 8b 4f 30 mov 0x30(%rdi),%rcx 14: 48 63 d2 movslq %edx,%rdx 17: 48 01 ca add %rcx,%rdx 1a: 85 db test %ebx,%ebx 1c: 0f 84 f3 00 00 00 je 0x115 22: 49 29 d1 sub %rdx,%r9 25: bb 80 00 00 00 mov $0x80,%ebx 2a:* 4c 03 54 f7 38 add 0x38(%rdi,%rsi,8),%r10 <-- trapping instruction 2f: 31 d2 xor %edx,%edx 31: 4d 39 ca cmp %r9,%r10 34: 0f 8d d2 00 00 00 jge 0x10c 3a: ba 01 00 00 00 mov $0x1,%edx 3f: 85 .byte 0x85 Code starting with the faulting instruction =========================================== 0: 4c 03 54 f7 38 add 0x38(%rdi,%rsi,8),%r10 5: 31 d2 xor %edx,%edx 7: 4d 39 ca cmp %r9,%r10 a: 0f 8d d2 00 00 00 jge 0xe2 10: ba 01 00 00 00 mov $0x1,%edx 15: 85 .byte 0x85 Mar 20 11:56:29 extra-ext4-4k kernel: RSP: 0018:ffffa5bb002b7c78 EFLAGS: 00010206 Mar 20 11:56:29 extra-ext4-4k kernel: RAX: 0000000000000000 RBX: 0000000000000080 RCX: 0000000000000000 Mar 20 11:56:29 extra-ext4-4k kernel: RDX: 0000000000000000 RSI: 0000000000002431 RDI: ffff93097fff9840 Mar 20 11:56:29 extra-ext4-4k kernel: RBP: 0000000000000009 R08: 0000000000000080 R09: 0000000000005e90 Mar 20 11:56:29 extra-ext4-4k kernel: R10: 0000000000000c8e R11: 0000000000000c8e R12: 0000000000000000 Mar 20 11:56:29 extra-ext4-4k kernel: R13: 0000000000002431 R14: 0000000000000002 R15: 0000000000000000 Mar 20 11:56:29 extra-ext4-4k kernel: FS: 0000000000000000(0000) GS:ffff93097bc00000(0000) knlGS:0000000000000000 Mar 20 11:56:29 extra-ext4-4k kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Mar 20 11:56:29 extra-ext4-4k kernel: CR2: ffff93098000ba00 CR3: 000000010c602004 CR4: 0000000000772ef0 Mar 20 11:56:29 extra-ext4-4k kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Mar 20 11:56:29 extra-ext4-4k kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Mar 20 11:56:29 extra-ext4-4k kernel: PKRU: 55555554 Mar 20 11:56:29 extra-ext4-4k kernel: note: kcompactd0[74] exited with irqs disabled Luis ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [linux-next:master] [block/bdev] 3c20917120: BUG:sleeping_function_called_from_invalid_context_at_mm/util.c 2025-03-20 12:11 ` Luis Chamberlain @ 2025-03-20 12:18 ` Luis Chamberlain 2025-03-22 23:14 ` Johannes Weiner 1 sibling, 0 replies; 31+ messages in thread From: Luis Chamberlain @ 2025-03-20 12:18 UTC (permalink / raw) To: Matthew Wilcox, Johannes Weiner Cc: Jan Kara, Oliver Sang, David Hildenbrand, Alistair Popple, linux-mm, Christian Brauner, Hannes Reinecke, oe-lkp, lkp, John Garry, linux-block, ltp, Pankaj Raghav, Daniel Gomez, David Bueso On Thu, Mar 20, 2025 at 05:11:21AM -0700, Luis Chamberlain wrote: > Sure, the culprit is the patch titled: > > mm: page_alloc: trace type pollution from compaction capturing Sorry.. that's incorrect, the right title is: mm: compaction: push watermark into compaction_suitable() callers Luis ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [linux-next:master] [block/bdev] 3c20917120: BUG:sleeping_function_called_from_invalid_context_at_mm/util.c 2025-03-20 12:11 ` Luis Chamberlain 2025-03-20 12:18 ` Luis Chamberlain @ 2025-03-22 23:14 ` Johannes Weiner 2025-03-23 1:02 ` Luis Chamberlain 1 sibling, 1 reply; 31+ messages in thread From: Johannes Weiner @ 2025-03-22 23:14 UTC (permalink / raw) To: Luis Chamberlain Cc: Matthew Wilcox, Jan Kara, Oliver Sang, David Hildenbrand, Alistair Popple, linux-mm, Christian Brauner, Hannes Reinecke, oe-lkp, lkp, John Garry, linux-block, ltp, Pankaj Raghav, Daniel Gomez, David Bueso Hey Luis, On Thu, Mar 20, 2025 at 05:11:19AM -0700, Luis Chamberlain wrote: > On Wed, Mar 19, 2025 at 07:24:23PM +0000, Matthew Wilcox wrote: > > On Wed, Mar 19, 2025 at 12:16:41PM -0700, Luis Chamberlain wrote: > > > On Wed, Mar 19, 2025 at 09:55:11AM -0700, Luis Chamberlain wrote: > > > > FWIW, I'm not seeing this crash or any kernel splat within the > > > > same time (I'll let this run the full 2.5 hours now to verify) on > > > > vanilla 6.14.0-rc3 + the 64k-sector-size patches, which would explain why I > > > > hadn't seen this in my earlier testing over 10 ext4 profiles on fstests. This > > > > particular crash seems likely to be an artifact on the development cycle on > > > > next-20250317. > > > > > > I confirm that with a vanilla 6.14.0-rc3 + the 64k-sector-size patches a 2.5 > > > hour run generic/750 doesn't crash at all. So indeed something on the > > > development cycle leads to this particular crash. > > > > We can't debug two problems at once. > > > > FOr the first problem, I've demonstrated what the cause is, and that's > > definitely introduced by your patch, so we need to figure out a > > solution. > > Sure, yeah I followed that. > > > For the second problem, we don't know what it is. Do you want to bisect > > it to figure out which commit introduced it? > > Sure, the culprit is the patch titled: > > mm: page_alloc: trace type pollution from compaction capturing > > Johannes, any ideas? You can reproduce easily (1-2 minutes) by running > fstests against ext4 with a 4k block size filesystem on linux-next > against the test generic/750. Sorry for the late reply, I just saw your emails now. > Below is the splat decoded. > > Mar 20 11:52:55 extra-ext4-4k kernel: Linux version 6.14.0-rc6+ (mcgrof@beefy) (gcc (Debian 14.2.0-16) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44) #51 SMP PREEMPT_DYNAMIC Thu Mar 20 11:50:32 UTC 2025 > Mar 20 11:52:55 extra-ext4-4k kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-6.14.0-rc6+ root=PARTUUID=503fa6f2-2d5b-4d7e-8cf8-3a811de326ce ro console=tty0 console=tty1 console=ttyS0,115200n8 console=ttyS0 > > < -- etc --> > > Mar 20 11:55:27 extra-ext4-4k unknown: run fstests generic/750 at 2025-03-20 11:55:27 > Mar 20 11:55:28 extra-ext4-4k kernel: EXT4-fs (loop5): mounted filesystem c20cbdee-a370-4743-80aa-95dec0beaaa2 r/w with ordered data mode. Quota mode: none. > Mar 20 11:56:29 extra-ext4-4k kernel: BUG: unable to handle page fault for address: ffff93098000ba00 > Mar 20 11:56:29 extra-ext4-4k kernel: #PF: supervisor read access in kernel mode > Mar 20 11:56:29 extra-ext4-4k kernel: #PF: error_code(0x0000) - not-present page > Mar 20 11:56:29 extra-ext4-4k kernel: PGD 3a201067 P4D 3a201067 PUD 0 > Mar 20 11:56:29 extra-ext4-4k kernel: Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI > Mar 20 11:56:29 extra-ext4-4k kernel: CPU: 0 UID: 0 PID: 74 Comm: kcompactd0 Not tainted 6.14.0-rc6+ #51 > Mar 20 11:56:29 extra-ext4-4k kernel: Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 2024.11-5 01/28/2025 > Mar 20 11:56:29 extra-ext4-4k kernel: RIP: 0010:__zone_watermark_ok (mm/page_alloc.c:3256) > Mar 20 11:56:29 extra-ext4-4k kernel: Code: 00 00 00 41 f7 c0 38 02 00 00 0f 85 2c 01 00 00 48 8b 4f 30 48 63 d2 48 01 ca 85 db 0f 84 f3 00 00 00 49 29 d1 bb 80 00 00 00 <4c> 03 54 f7 38 31 d2 4d 39 ca 0f 8d d2 00 00 00 ba 01 00 00 00 85 > All code > ======== > 0: 00 00 add %al,(%rax) > 2: 00 41 f7 add %al,-0x9(%rcx) > 5: c0 38 02 sarb $0x2,(%rax) > 8: 00 00 add %al,(%rax) > a: 0f 85 2c 01 00 00 jne 0x13c > 10: 48 8b 4f 30 mov 0x30(%rdi),%rcx > 14: 48 63 d2 movslq %edx,%rdx > 17: 48 01 ca add %rcx,%rdx > 1a: 85 db test %ebx,%ebx > 1c: 0f 84 f3 00 00 00 je 0x115 > 22: 49 29 d1 sub %rdx,%r9 > 25: bb 80 00 00 00 mov $0x80,%ebx > 2a:* 4c 03 54 f7 38 add 0x38(%rdi,%rsi,8),%r10 <-- trapping instruction This looks like the same issue the bot reported here: https://lore.kernel.org/all/20250321135524.GA1888695@cmpxchg.org/ There is a fix for it queued in next-20250318 and later. Could you please double check with your reproducer against a more recent next? Thanks ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [linux-next:master] [block/bdev] 3c20917120: BUG:sleeping_function_called_from_invalid_context_at_mm/util.c 2025-03-22 23:14 ` Johannes Weiner @ 2025-03-23 1:02 ` Luis Chamberlain 2025-03-23 7:07 ` Luis Chamberlain 0 siblings, 1 reply; 31+ messages in thread From: Luis Chamberlain @ 2025-03-23 1:02 UTC (permalink / raw) To: Johannes Weiner Cc: Matthew Wilcox, Jan Kara, Oliver Sang, David Hildenbrand, Alistair Popple, linux-mm, Christian Brauner, Hannes Reinecke, oe-lkp, lkp, John Garry, linux-block, ltp, Pankaj Raghav, Daniel Gomez, David Bueso On Sat, Mar 22, 2025 at 07:14:40PM -0400, Johannes Weiner wrote: > Hey Luis, > > This looks like the same issue the bot reported here: > > https://lore.kernel.org/all/20250321135524.GA1888695@cmpxchg.org/ > > There is a fix for it queued in next-20250318 and later. Could you > please double check with your reproducer against a more recent next? Confirmed, at least it's been 30 minutes and no crashes now where as before it would crash in 1 minute. I'll let it soak for 2.5 hours in the hopes I can trigger the warning originally reported by this thread. Even though from code inspection I see how the kernel warning would trigger I just want to force trigger it on a test, and I can't yet. Luis ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [linux-next:master] [block/bdev] 3c20917120: BUG:sleeping_function_called_from_invalid_context_at_mm/util.c 2025-03-23 1:02 ` Luis Chamberlain @ 2025-03-23 7:07 ` Luis Chamberlain 2025-03-25 6:52 ` Oliver Sang 0 siblings, 1 reply; 31+ messages in thread From: Luis Chamberlain @ 2025-03-23 7:07 UTC (permalink / raw) To: Johannes Weiner, Oliver Sang Cc: Matthew Wilcox, Jan Kara, David Hildenbrand, Alistair Popple, linux-mm, Christian Brauner, Hannes Reinecke, oe-lkp, lkp, John Garry, linux-block, ltp, Pankaj Raghav, Daniel Gomez, David Bueso On Sat, Mar 22, 2025 at 06:02:13PM -0700, Luis Chamberlain wrote: > On Sat, Mar 22, 2025 at 07:14:40PM -0400, Johannes Weiner wrote: > > Hey Luis, > > > > This looks like the same issue the bot reported here: > > > > https://lore.kernel.org/all/20250321135524.GA1888695@cmpxchg.org/ > > > > There is a fix for it queued in next-20250318 and later. Could you > > please double check with your reproducer against a more recent next? > > Confirmed, at least it's been 30 minutes and no crashes now where as > before it would crash in 1 minute. I'll let it soak for 2.5 hours in > the hopes I can trigger the warning originally reported by this thread. > > Even though from code inspection I see how the kernel warning would > trigger I just want to force trigger it on a test, and I can't yet. Survied 5 hours now. This certainly fixed that crash. As for the kernel warning, I can't yet reproduce that, so trying to run generic/750 forever and looping ./testcases/kernel/syscalls/close_range/close_range01 and yet nothing. Oliver can you reproduce the kernel warning on next-20250321 ? Luis ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [linux-next:master] [block/bdev] 3c20917120: BUG:sleeping_function_called_from_invalid_context_at_mm/util.c 2025-03-23 7:07 ` Luis Chamberlain @ 2025-03-25 6:52 ` Oliver Sang 2025-03-28 1:44 ` Luis Chamberlain 0 siblings, 1 reply; 31+ messages in thread From: Oliver Sang @ 2025-03-25 6:52 UTC (permalink / raw) To: Luis Chamberlain Cc: Johannes Weiner, Matthew Wilcox, Jan Kara, David Hildenbrand, Alistair Popple, linux-mm, Christian Brauner, Hannes Reinecke, oe-lkp, lkp, John Garry, linux-block, ltp, Pankaj Raghav, Daniel Gomez, David Bueso, oliver.sang [-- Attachment #1: Type: text/plain, Size: 6570 bytes --] hi, Luis, On Sun, Mar 23, 2025 at 12:07:27AM -0700, Luis Chamberlain wrote: > On Sat, Mar 22, 2025 at 06:02:13PM -0700, Luis Chamberlain wrote: > > On Sat, Mar 22, 2025 at 07:14:40PM -0400, Johannes Weiner wrote: > > > Hey Luis, > > > > > > This looks like the same issue the bot reported here: > > > > > > https://lore.kernel.org/all/20250321135524.GA1888695@cmpxchg.org/ > > > > > > There is a fix for it queued in next-20250318 and later. Could you > > > please double check with your reproducer against a more recent next? > > > > Confirmed, at least it's been 30 minutes and no crashes now where as > > before it would crash in 1 minute. I'll let it soak for 2.5 hours in > > the hopes I can trigger the warning originally reported by this thread. > > > > Even though from code inspection I see how the kernel warning would > > trigger I just want to force trigger it on a test, and I can't yet. > > Survied 5 hours now. This certainly fixed that crash. > > As for the kernel warning, I can't yet reproduce that, so trying to > run generic/750 forever and looping > ./testcases/kernel/syscalls/close_range/close_range01 > and yet nothing. > > Oliver can you reproduce the kernel warning on next-20250321 ? the issue still exists on 9388ec571cb1ad (tag: next-20250321, linux-next/master) Add linux-next specific files for 20250321 but randomly (reproduced 7 times in 12 runs, then ltp.close_range01 also failed. on another 5 times, the issue cannot be reproduced then ltp.close_range01 pass) one dmesg is attached FYI. kern :err : [ 215.378500] BUG: sleeping function called from invalid context at mm/util.c:743 kern :err : [ 215.386652] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 52, name: kcompactd0 kern :err : [ 215.395438] preempt_count: 1, expected: 0 kern :err : [ 215.400216] RCU nest depth: 0, expected: 0 kern :warn : [ 215.405081] CPU: 0 UID: 0 PID: 52 Comm: kcompactd0 Tainted: G S 6.14.0-rc7-next-20250321 #1 PREEMPT(voluntary) kern :warn : [ 215.405095] Tainted: [S]=CPU_OUT_OF_SPEC kern :warn : [ 215.405097] Hardware name: Hewlett-Packard HP Pro 3340 MT/17A1, BIOS 8.07 01/24/2013 kern :warn : [ 215.405101] Call Trace: kern :warn : [ 215.405104] <TASK> kern :warn : [ 215.405107] dump_stack_lvl+0x4f/0x70 kern :warn : [ 215.405118] __might_resched+0x2c6/0x450 kern :warn : [ 215.405128] folio_mc_copy+0xca/0x1f0 kern :warn : [ 215.405137] ? _raw_spin_lock+0x80/0xe0 kern :warn : [ 215.405145] __migrate_folio+0x117/0x2e0 kern :warn : [ 215.405154] __buffer_migrate_folio+0x563/0x670 kern :warn : [ 215.405161] move_to_new_folio+0xf5/0x410 kern :warn : [ 215.405168] migrate_folio_move+0x210/0x770 kern :warn : [ 215.405173] ? __pfx_compaction_free+0x10/0x10 kern :warn : [ 215.405181] ? __pfx_migrate_folio_move+0x10/0x10 kern :warn : [ 215.405187] ? compaction_alloc_noprof+0x441/0x720 kern :warn : [ 215.405195] ? __pfx_compaction_alloc+0x10/0x10 kern :warn : [ 215.405202] ? __pfx_compaction_free+0x10/0x10 kern :warn : [ 215.405208] ? __pfx_compaction_free+0x10/0x10 kern :warn : [ 215.405213] ? migrate_folio_unmap+0x329/0x890 kern :warn : [ 215.405221] migrate_pages_batch+0xe67/0x1800 kern :warn : [ 215.405227] ? __pfx_compaction_free+0x10/0x10 kern :warn : [ 215.405236] ? __pfx_migrate_pages_batch+0x10/0x10 kern :warn : [ 215.405243] ? pick_next_task_fair+0x304/0xba0 kern :warn : [ 215.405253] ? finish_task_switch+0x155/0x750 kern :warn : [ 215.405260] ? __switch_to+0x5ba/0x1020 kern :warn : [ 215.405268] migrate_pages_sync+0x10b/0x8e0 kern :warn : [ 215.405275] ? __pfx_compaction_alloc+0x10/0x10 kern :warn : [ 215.405281] ? __pfx_compaction_free+0x10/0x10 kern :warn : [ 215.405289] ? __pfx_migrate_pages_sync+0x10/0x10 kern :warn : [ 215.405295] ? set_pfnblock_flags_mask+0x178/0x220 kern :warn : [ 215.405303] ? __pfx_lru_gen_del_folio+0x10/0x10 kern :warn : [ 215.405310] ? __pfx_compaction_alloc+0x10/0x10 kern :warn : [ 215.405316] ? __pfx_compaction_free+0x10/0x10 kern :warn : [ 215.405323] migrate_pages+0x842/0xe30 kern :warn : [ 215.405331] ? __pfx_compaction_alloc+0x10/0x10 kern :warn : [ 215.405337] ? __pfx_compaction_free+0x10/0x10 kern :warn : [ 215.405345] ? __pfx_migrate_pages+0x10/0x10 kern :warn : [ 215.405351] ? __compact_finished+0x91b/0xbd0 kern :warn : [ 215.405359] ? isolate_migratepages+0x32d/0xbd0 kern :warn : [ 215.405367] compact_zone+0x9df/0x16c0 kern :warn : [ 215.405377] ? __pfx_compact_zone+0x10/0x10 kern :warn : [ 215.405383] ? _raw_spin_lock_irqsave+0x86/0xe0 kern :warn : [ 215.405390] ? __pfx__raw_spin_lock_irqsave+0x10/0x10 kern :warn : [ 215.405397] compact_node+0x158/0x250 kern :warn : [ 215.405405] ? __pfx_compact_node+0x10/0x10 kern :warn : [ 215.405416] ? __pfx_extfrag_for_order+0x10/0x10 kern :warn : [ 215.405425] ? __pfx_mutex_unlock+0x10/0x10 kern :warn : [ 215.405432] ? finish_wait+0xd1/0x280 kern :warn : [ 215.405441] kcompactd+0x5d0/0xa30 kern :warn : [ 215.405450] ? __pfx_kcompactd+0x10/0x10 kern :warn : [ 215.405456] ? _raw_spin_lock_irqsave+0x86/0xe0 kern :warn : [ 215.405462] ? __pfx__raw_spin_lock_irqsave+0x10/0x10 kern :warn : [ 215.405469] ? __pfx_autoremove_wake_function+0x10/0x10 kern :warn : [ 215.405477] ? __kthread_parkme+0xba/0x1e0 kern :warn : [ 215.405485] ? __pfx_kcompactd+0x10/0x10 kern :warn : [ 215.405492] kthread+0x3a0/0x770 kern :warn : [ 215.405498] ? __pfx_kthread+0x10/0x10 kern :warn : [ 215.405504] ? __pfx_kthread+0x10/0x10 kern :warn : [ 215.405510] ret_from_fork+0x30/0x70 kern :warn : [ 215.405516] ? __pfx_kthread+0x10/0x10 kern :warn : [ 215.405521] ret_from_fork_asm+0x1a/0x30 kern :warn : [ 215.405530] </TASK> user :notice: [ 216.962224] Modules Loaded netconsole btrfs blake2b_generic xor zstd_compress raid6_pq snd_hda_codec_realtek snd_hda_codec_generic snd_hda_scodec_component intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp i915 sd_mod sg kvm_intel intel_gtt ipmi_devintf ipmi_msghandler cec kvm drm_buddy snd_hda_intel snd_intel_dspcfg ttm snd_intel_sdw_acpi ghash_clmulni_intel drm_display_helper snd_hda_codec rapl drm_client_lib intel_cstate snd_hda_core drm_kms_helper ahci snd_hwdep libahci snd_pcm wmi_bmof mei_me video intel_uncore mei lpc_ich libata snd_timer pcspkr snd i2c_i801 i2c_smbus soundcore wmi binfmt_misc loop drm fuse dm_mod ip_tables > > Luis [-- Attachment #2: kmsg.xz --] [-- Type: application/x-xz, Size: 31488 bytes --] ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [linux-next:master] [block/bdev] 3c20917120: BUG:sleeping_function_called_from_invalid_context_at_mm/util.c 2025-03-25 6:52 ` Oliver Sang @ 2025-03-28 1:44 ` Luis Chamberlain 2025-03-28 4:21 ` Luis Chamberlain 0 siblings, 1 reply; 31+ messages in thread From: Luis Chamberlain @ 2025-03-28 1:44 UTC (permalink / raw) To: Oliver Sang Cc: Johannes Weiner, Matthew Wilcox, Jan Kara, David Hildenbrand, Alistair Popple, linux-mm, Christian Brauner, Hannes Reinecke, oe-lkp, lkp, John Garry, linux-block, ltp, Pankaj Raghav, Daniel Gomez, David Bueso On Tue, Mar 25, 2025 at 02:52:49PM +0800, Oliver Sang wrote: > hi, Luis, > > On Sun, Mar 23, 2025 at 12:07:27AM -0700, Luis Chamberlain wrote: > > On Sat, Mar 22, 2025 at 06:02:13PM -0700, Luis Chamberlain wrote: > > > On Sat, Mar 22, 2025 at 07:14:40PM -0400, Johannes Weiner wrote: > > > > Hey Luis, > > > > > > > > This looks like the same issue the bot reported here: > > > > > > > > https://lore.kernel.org/all/20250321135524.GA1888695@cmpxchg.org/ > > > > > > > > There is a fix for it queued in next-20250318 and later. Could you > > > > please double check with your reproducer against a more recent next? > > > > > > Confirmed, at least it's been 30 minutes and no crashes now where as > > > before it would crash in 1 minute. I'll let it soak for 2.5 hours in > > > the hopes I can trigger the warning originally reported by this thread. > > > > > > Even though from code inspection I see how the kernel warning would > > > trigger I just want to force trigger it on a test, and I can't yet. > > > > Survied 5 hours now. This certainly fixed that crash. > > > > As for the kernel warning, I can't yet reproduce that, so trying to > > run generic/750 forever and looping > > ./testcases/kernel/syscalls/close_range/close_range01 > > and yet nothing. > > > > Oliver can you reproduce the kernel warning on next-20250321 ? > > the issue still exists on > 9388ec571cb1ad (tag: next-20250321, linux-next/master) Add linux-next specific files for 20250321 > > but randomly (reproduced 7 times in 12 runs, then ltp.close_range01 also failed. > on another 5 times, the issue cannot be reproduced then ltp.close_range01 pass) OK I narrowed down a reproducer to requiring the patch below diff --git a/mm/util.c b/mm/util.c index 448117da071f..3585bdb8700a 100644 --- a/mm/util.c +++ b/mm/util.c @@ -735,6 +735,8 @@ int folio_mc_copy(struct folio *dst, struct folio *src) long nr = folio_nr_pages(src); long i = 0; + might_sleep(); + for (;;) { if (copy_mc_highpage(folio_page(dst, i), folio_page(src, i))) return -EHWPOISON; And then just running: dd if=/dev/zero of=/dev/vde bs=1024M count=1024 For some reason a kernel with the following didn't trigger it so the above patch is needed CONFIG_PROVE_LOCKING=y CONFIG_DEBUG_SPINLOCK=y CONFIG_ACPI_SLEEP=y It may have to do with my preemtpion settings: CONFIG_PREEMPT_BUILD=y CONFIG_ARCH_HAS_PREEMPT_LAZY=y # CONFIG_PREEMPT_NONE is not set CONFIG_PREEMPT_VOLUNTARY=y # CONFIG_PREEMPT is not set # CONFIG_PREEMPT_LAZY is not set CONFIG_PREEMPT_COUNT=y CONFIG_PREEMPTION=y CONFIG_PREEMPT_DYNAMIC=y CONFIG_PREEMPT_RCU=y And so now to see how we should fix it. LUis ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [linux-next:master] [block/bdev] 3c20917120: BUG:sleeping_function_called_from_invalid_context_at_mm/util.c 2025-03-28 1:44 ` Luis Chamberlain @ 2025-03-28 4:21 ` Luis Chamberlain 2025-03-28 9:47 ` Luis Chamberlain 0 siblings, 1 reply; 31+ messages in thread From: Luis Chamberlain @ 2025-03-28 4:21 UTC (permalink / raw) To: Jan Kara, Kefeng Wang Cc: Johannes Weiner, Oliver Sang, Matthew Wilcox, David Hildenbrand, Alistair Popple, linux-mm, Christian Brauner, Hannes Reinecke, oe-lkp, lkp, John Garry, linux-block, ltp, Pankaj Raghav, Daniel Gomez, David Bueso On Thu, Mar 27, 2025 at 06:44:56PM -0700, Luis Chamberlain wrote: > On Tue, Mar 25, 2025 at 02:52:49PM +0800, Oliver Sang wrote: > > hi, Luis, > > > > On Sun, Mar 23, 2025 at 12:07:27AM -0700, Luis Chamberlain wrote: > > > On Sat, Mar 22, 2025 at 06:02:13PM -0700, Luis Chamberlain wrote: > > > > On Sat, Mar 22, 2025 at 07:14:40PM -0400, Johannes Weiner wrote: > > > > > Hey Luis, > > > > > > > > > > This looks like the same issue the bot reported here: > > > > > > > > > > https://lore.kernel.org/all/20250321135524.GA1888695@cmpxchg.org/ > > > > > > > > > > There is a fix for it queued in next-20250318 and later. Could you > > > > > please double check with your reproducer against a more recent next? > > > > > > > > Confirmed, at least it's been 30 minutes and no crashes now where as > > > > before it would crash in 1 minute. I'll let it soak for 2.5 hours in > > > > the hopes I can trigger the warning originally reported by this thread. > > > > > > > > Even though from code inspection I see how the kernel warning would > > > > trigger I just want to force trigger it on a test, and I can't yet. > > > > > > Survied 5 hours now. This certainly fixed that crash. > > > > > > As for the kernel warning, I can't yet reproduce that, so trying to > > > run generic/750 forever and looping > > > ./testcases/kernel/syscalls/close_range/close_range01 > > > and yet nothing. > > > > > > Oliver can you reproduce the kernel warning on next-20250321 ? > > > > the issue still exists on > > 9388ec571cb1ad (tag: next-20250321, linux-next/master) Add linux-next specific files for 20250321 > > > > but randomly (reproduced 7 times in 12 runs, then ltp.close_range01 also failed. >a> on another 5 times, the issue cannot be reproduced then ltp.close_range01 pass) > > OK I narrowed down a reproducer to requiring the patch below > > > diff --git a/mm/util.c b/mm/util.c > index 448117da071f..3585bdb8700a 100644 > --- a/mm/util.c > +++ b/mm/util.c > @@ -735,6 +735,8 @@ int folio_mc_copy(struct folio *dst, struct folio *src) > long nr = folio_nr_pages(src); > long i = 0; > > + might_sleep(); > + > for (;;) { > if (copy_mc_highpage(folio_page(dst, i), folio_page(src, i))) > return -EHWPOISON; > > > And then just running: > > dd if=/dev/zero of=/dev/vde bs=1024M count=1024 > > For some reason a kernel with the following didn't trigger it so the > above patch is needed > > > CONFIG_PROVE_LOCKING=y > CONFIG_DEBUG_SPINLOCK=y > CONFIG_ACPI_SLEEP=y > > It may have to do with my preemtpion settings: > > CONFIG_PREEMPT_BUILD=y > CONFIG_ARCH_HAS_PREEMPT_LAZY=y > # CONFIG_PREEMPT_NONE is not set > CONFIG_PREEMPT_VOLUNTARY=y > # CONFIG_PREEMPT is not set > # CONFIG_PREEMPT_LAZY is not set > CONFIG_PREEMPT_COUNT=y > CONFIG_PREEMPTION=y > CONFIG_PREEMPT_DYNAMIC=y > CONFIG_PREEMPT_RCU=y > > And so now to see how we should fix it. Would the extra ref check added via commit 060913999d7a9e50 ("mm: migrate: support poisoned recover from migrate folio") make the removal of the spin lock safe now given all the buffers are locked from the folio? This survives some basic sanity checks on my end with generic/750 against ext4 and also filling a drive at the same time with fio. I have a feeling is we are not sure, do we have a reproducer for the issue reported through ebdf4de5642fb6 ("mm: migrate: fix reference check race between __find_get_block() and migration")? I suspect the answer is now. The only other thing I can think of at this tie is to add the lru_cache_disabled() || cpu_is_isolated(smp_processor_id())) checks on __find_get_block_slow() as we do in bh_lru_install() but I am not sure if that suffices for the old races. Thoughts? diff --git a/mm/migrate.c b/mm/migrate.c index 97f0edf0c032..6a5d125ecde9 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -859,12 +859,12 @@ static int __buffer_migrate_folio(struct address_space *mapping, } bh = bh->b_this_page; } while (bh != head); + spin_unlock(&mapping->i_private_lock); if (busy) { if (invalidated) { rc = -EAGAIN; goto unlock_buffers; } - spin_unlock(&mapping->i_private_lock); invalidate_bh_lrus(); invalidated = true; goto recheck_buffers; @@ -882,8 +882,6 @@ static int __buffer_migrate_folio(struct address_space *mapping, } while (bh != head); unlock_buffers: - if (check_refs) - spin_unlock(&mapping->i_private_lock); bh = head; do { unlock_buffer(bh); diff --git a/mm/util.c b/mm/util.c index 448117da071f..3585bdb8700a 100644 --- a/mm/util.c +++ b/mm/util.c @@ -735,6 +735,8 @@ int folio_mc_copy(struct folio *dst, struct folio *src) long nr = folio_nr_pages(src); long i = 0; + might_sleep(); + for (;;) { if (copy_mc_highpage(folio_page(dst, i), folio_page(src, i))) return -EHWPOISON; ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [linux-next:master] [block/bdev] 3c20917120: BUG:sleeping_function_called_from_invalid_context_at_mm/util.c 2025-03-28 4:21 ` Luis Chamberlain @ 2025-03-28 9:47 ` Luis Chamberlain 2025-03-28 19:09 ` Luis Chamberlain 0 siblings, 1 reply; 31+ messages in thread From: Luis Chamberlain @ 2025-03-28 9:47 UTC (permalink / raw) To: Jan Kara, Kefeng Wang Cc: Johannes Weiner, Oliver Sang, Matthew Wilcox, David Hildenbrand, Alistair Popple, linux-mm, Christian Brauner, Hannes Reinecke, oe-lkp, lkp, John Garry, linux-block, ltp, Pankaj Raghav, Daniel Gomez, David Bueso On Thu, Mar 27, 2025 at 09:21:30PM -0700, Luis Chamberlain wrote: > Would the extra ref check added via commit 060913999d7a9e50 ("mm: > migrate: support poisoned recover from migrate folio") make the removal > of the spin lock safe now given all the buffers are locked from the > folio? This survives some basic sanity checks on my end with > generic/750 against ext4 and also filling a drive at the same time with > fio. I have a feeling is we are not sure, do we have a reproducer for > the issue reported through ebdf4de5642fb6 ("mm: migrate: fix reference > check race between __find_get_block() and migration")? I suspect the > answer is no. <-- snip --> > diff --git a/mm/migrate.c b/mm/migrate.c > index 97f0edf0c032..6a5d125ecde9 100644 > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -859,12 +859,12 @@ static int __buffer_migrate_folio(struct address_space *mapping, > } > bh = bh->b_this_page; > } while (bh != head); > + spin_unlock(&mapping->i_private_lock); > if (busy) { > if (invalidated) { > rc = -EAGAIN; > goto unlock_buffers; > } > - spin_unlock(&mapping->i_private_lock); > invalidate_bh_lrus(); > invalidated = true; > goto recheck_buffers; > @@ -882,8 +882,6 @@ static int __buffer_migrate_folio(struct address_space *mapping, > } while (bh != head); > > unlock_buffers: > - if (check_refs) > - spin_unlock(&mapping->i_private_lock); > bh = head; > do { > unlock_buffer(bh); > diff --git a/mm/util.c b/mm/util.c > index 448117da071f..3585bdb8700a 100644 > --- a/mm/util.c > +++ b/mm/util.c > @@ -735,6 +735,8 @@ int folio_mc_copy(struct folio *dst, struct folio *src) > long nr = folio_nr_pages(src); > long i = 0; > > + might_sleep(); > + > for (;;) { > if (copy_mc_highpage(folio_page(dst, i), folio_page(src, i))) > return -EHWPOISON; Nah, this ends up producing the following so I'm inclined at this point to just rever the 64k 64k block size enablment until we get this figured out because I can't think of an easy quick solution to this. Mar 28 03:35:30 extra-ext4-4k kernel: Linux version 6.14.0-rc7-next-20250321-dirty (mcgrof@beef) (gcc (Debian 14.2.0-16) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44) #57 SMP PREEMPT_DYNAMIC Fri Mar 28 03:33:04 UTC 2025 Mar 28 03:35:30 extra-ext4-4k kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-6.14.0-rc7-next-20250321-dirty root=PARTUUID=503fa6f2-2d5b-4d7e-8cf8-3a811de326ce ro console=tty0 console=tty1 console=ttyS0,115200n8 console=ttyS0 <-- snip --> Mar 28 03:36:32 extra-ext4-4k kernel: EXT4-fs (loop16): mounted filesystem 90cdb700-ad4a-4261-a1be-4f4627772317 r/w with ordered data mode. Quota mode: none. Mar 28 03:36:37 extra-ext4-4k kernel: EXT4-fs (loop5): mounted filesystem fef0662d-01fc-483d-87ac-8e4ef2939de3 r/w with ordered data mode. Quota mode: none. Mar 28 03:36:37 extra-ext4-4k kernel: EXT4-fs (loop5): unmounting filesystem fef0662d-01fc-483d-87ac-8e4ef2939de3. Mar 28 03:36:37 extra-ext4-4k kernel: EXT4-fs (loop16): unmounting filesystem 90cdb700-ad4a-4261-a1be-4f4627772317. Mar 28 03:36:37 extra-ext4-4k kernel: EXT4-fs (loop16): mounted filesystem 90cdb700-ad4a-4261-a1be-4f4627772317 r/w with ordered data mode. Quota mode: none. Mar 28 03:36:37 extra-ext4-4k unknown: run fstests generic/750 at 2025-03-28 03:36:37 Mar 28 03:36:39 extra-ext4-4k kernel: EXT4-fs (loop5): mounted filesystem ed8a8fa0-0ea1-4820-aa26-366cd64a6e36 r/w with ordered data mode. Quota mode: none. Mar 28 03:39:06 extra-ext4-4k kernel: rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { P7603 } 8 jiffies s: 565 root: 0x0/T Mar 28 03:39:06 extra-ext4-4k kernel: rcu: blocking rcu_node structures (internal RCU debug): Mar 28 03:59:47 extra-ext4-4k kernel: NOHZ tick-stop error: local softirq work is pending, handler #10!!! Mar 28 04:24:47 extra-ext4-4k kernel: ------------[ cut here ]------------ Mar 28 04:24:47 extra-ext4-4k kernel: WARNING: CPU: 7 PID: 1790 at mm/slub.c:4756 free_large_kmalloc+0xc1/0x100 Mar 28 04:24:47 extra-ext4-4k kernel: Modules linked in: loop sunrpc 9p nls_iso8859_1 kvm_intel nls_cp437 vfat crc32c_generic fat kvm ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 aesni_intel gf128mul crypto_simd cryptd 9pnet_virtio virtio_balloon virtio_console evdev button joydev serio_raw nvme_fabrics nvme_core dm_mod drm nfnetlink vsock_loopback vmw_vsock_virtio_transport_common vsock autofs4 ext4 crc16 mbcache jbd2 btrfs blake2b_generic efivarfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq raid1 raid0 md_mod virtio_net net_failover failover virtio_blk virtio_pci virtio_pci_legacy_dev virtio_pci_modern_dev psmouse virtio virtio_ring Mar 28 04:24:47 extra-ext4-4k kernel: CPU: 7 UID: 0 PID: 1790 Comm: fsstress Not tainted 6.14.0-rc7-next-20250321-dirty #57 PREEMPT(full) Mar 28 04:24:47 extra-ext4-4k kernel: Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 2024.11-5 01/28/2025 Mar 28 04:24:47 extra-ext4-4k kernel: RIP: 0010:free_large_kmalloc+0xc1/0x100 Mar 28 04:24:47 extra-ext4-4k kernel: Code: f8 00 00 00 75 24 0f 0b 80 3d de 57 3b 01 00 0f 84 4f 63 be ff bd 00 f0 ff ff eb 8e 48 c7 c6 10 03 27 90 e8 61 32 fa ff 0f 0b <0f> 0b 48 83 c4 08 48 89 df 48 c7 c6 18 db 31 90 5b 5d e9 48 32 fa Mar 28 04:24:47 extra-ext4-4k kernel: RSP: 0018:ffffa95942a67ac8 EFLAGS: 00010202 Mar 28 04:24:47 extra-ext4-4k kernel: RAX: 00000000000000ff RBX: fffffc63c4219c40 RCX: 0000000000000001 Mar 28 04:24:47 extra-ext4-4k kernel: RDX: 0000000000000000 RSI: ffff978e08671000 RDI: fffffc63c4219c40 Mar 28 04:24:47 extra-ext4-4k kernel: RBP: 0000000000000000 R08: 0000000000000020 R09: fffffffffffffff0 Mar 28 04:24:47 extra-ext4-4k kernel: R10: 00000000000000a0 R11: 0000000000000004 R12: 0000000000000000 Mar 28 04:24:47 extra-ext4-4k kernel: R13: ffff978e08671000 R14: 0000000000000000 R15: ffff978d03bf1000 Mar 28 04:24:47 extra-ext4-4k kernel: FS: 00007fefc4670740(0000) GS:ffff978eecda0000(0000) knlGS:0000000000000000 Mar 28 04:24:47 extra-ext4-4k kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Mar 28 04:24:47 extra-ext4-4k kernel: CR2: 00007fefc4872000 CR3: 0000000075fa6002 CR4: 0000000000772ef0 Mar 28 04:24:47 extra-ext4-4k kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Mar 28 04:24:47 extra-ext4-4k kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Mar 28 04:24:47 extra-ext4-4k kernel: PKRU: 55555554 Mar 28 04:24:47 extra-ext4-4k kernel: Call Trace: Mar 28 04:24:47 extra-ext4-4k kernel: <TASK> Mar 28 04:24:47 extra-ext4-4k kernel: ? __warn.cold+0xb7/0x14f Mar 28 04:24:47 extra-ext4-4k kernel: ? free_large_kmalloc+0xc1/0x100 Mar 28 04:24:47 extra-ext4-4k kernel: ? report_bug+0xe6/0x170 Mar 28 04:24:47 extra-ext4-4k kernel: ? free_large_kmalloc+0xc1/0x100 Mar 28 04:24:47 extra-ext4-4k kernel: ? handle_bug+0x199/0x260 Mar 28 04:24:47 extra-ext4-4k kernel: ? exc_invalid_op+0x13/0x60 Mar 28 04:24:47 extra-ext4-4k kernel: ? asm_exc_invalid_op+0x16/0x20 Mar 28 04:24:47 extra-ext4-4k kernel: ? free_large_kmalloc+0xc1/0x100 Mar 28 04:24:47 extra-ext4-4k kernel: ext4_xattr_block_set+0x191/0x1200 [ext4] Mar 28 04:24:47 extra-ext4-4k kernel: ? xattr_find_entry+0x96/0x110 [ext4] Mar 28 04:24:47 extra-ext4-4k kernel: ext4_xattr_set_handle+0x572/0x630 [ext4] Mar 28 04:24:47 extra-ext4-4k kernel: ext4_xattr_set+0x7c/0x150 [ext4] Mar 28 04:24:47 extra-ext4-4k kernel: __vfs_removexattr+0x7c/0xb0 Mar 28 04:24:47 extra-ext4-4k kernel: __vfs_removexattr_locked+0xb7/0x150 Mar 28 04:24:47 extra-ext4-4k kernel: vfs_removexattr+0x58/0x100 Mar 28 04:24:47 extra-ext4-4k kernel: path_removexattrat+0x17d/0x330 Mar 28 04:24:47 extra-ext4-4k kernel: ? __do_sys_newfstatat+0x33/0x60 Mar 28 04:24:47 extra-ext4-4k kernel: __x64_sys_removexattr+0x19/0x20 Mar 28 04:24:47 extra-ext4-4k kernel: do_syscall_64+0x69/0x140 Mar 28 04:24:47 extra-ext4-4k kernel: entry_SYSCALL_64_after_hwframe+0x76/0x7e Mar 28 04:24:47 extra-ext4-4k kernel: RIP: 0033:0x7fefc4781037 Mar 28 04:24:47 extra-ext4-4k kernel: Code: f0 ff ff 73 01 c3 48 8b 0d be 8d 0d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 b8 c5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 91 8d 0d 00 f7 d8 64 89 01 48 Mar 28 04:24:47 extra-ext4-4k kernel: RSP: 002b:00007ffc2b5a5d48 EFLAGS: 00000206 ORIG_RAX: 00000000000000c5 Mar 28 04:24:47 extra-ext4-4k kernel: RAX: ffffffffffffffda RBX: 000000000002d937 RCX: 00007fefc4781037 Mar 28 04:24:47 extra-ext4-4k kernel: RDX: 0000000000000000 RSI: 00007ffc2b5a5d70 RDI: 0000563075ae5850 Mar 28 04:24:47 extra-ext4-4k kernel: RBP: 00007ffc2b5a5d70 R08: 0000000000000064 R09: 00000000ffffffff Mar 28 04:24:47 extra-ext4-4k kernel: R10: 0000000000000000 R11: 0000000000000206 R12: 00000000000030d4 Mar 28 04:24:47 extra-ext4-4k kernel: R13: 8f5c28f5c28f5c29 R14: 00007ffc2b5a5e20 R15: 0000563064291ea0 Mar 28 04:24:47 extra-ext4-4k kernel: </TASK> Mar 28 04:24:47 extra-ext4-4k kernel: irq event stamp: 94586373 Mar 28 04:24:47 extra-ext4-4k kernel: hardirqs last enabled at (94586383): [<ffffffff8f19ee1e>] __up_console_sem+0x5e/0x70 Mar 28 04:24:47 extra-ext4-4k kernel: hardirqs last disabled at (94586394): [<ffffffff8f19ee03>] __up_console_sem+0x43/0x70 Mar 28 04:24:47 extra-ext4-4k kernel: softirqs last enabled at (94585948): [<ffffffff8f0ffa53>] __irq_exit_rcu+0xc3/0x120 Mar 28 04:24:47 extra-ext4-4k kernel: softirqs last disabled at (94585929): [<ffffffff8f0ffa53>] __irq_exit_rcu+0xc3/0x120 Mar 28 04:24:47 extra-ext4-4k kernel: ---[ end trace 0000000000000000 ]--- Mar 28 04:24:47 extra-ext4-4k kernel: page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x402a88 pfn:0x108671 Mar 28 04:24:47 extra-ext4-4k kernel: flags: 0x57fffc000000000(node=1|zone=2|lastcpupid=0x1ffff) Mar 28 04:24:47 extra-ext4-4k kernel: raw: 057fffc000000000 dead000000000100 dead000000000122 0000000000000000 Mar 28 04:24:47 extra-ext4-4k kernel: raw: 0000000000402a88 0000000000000000 00000000ffffffff 0000000000000000 Mar 28 04:24:47 extra-ext4-4k kernel: page dumped because: Not a kmalloc allocation Mar 28 04:50:41 extra-ext4-4k kernel: BUG: MAX_LOCKDEP_CHAIN_HLOCKS too low! Mar 28 04:50:41 extra-ext4-4k kernel: turning off the locking correctness validator. Mar 28 04:50:41 extra-ext4-4k kernel: CPU: 4 UID: 0 PID: 668 Comm: btrfs-transacti Tainted: G W 6.14.0-rc7-next-20250321-dirty #57 PREEMPT(full) Mar 28 04:50:41 extra-ext4-4k kernel: Tainted: [W]=WARN Mar 28 04:50:41 extra-ext4-4k kernel: Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 2024.11-5 01/28/2025 Mar 28 04:50:41 extra-ext4-4k kernel: Call Trace: Mar 28 04:50:41 extra-ext4-4k kernel: <TASK> Mar 28 04:50:41 extra-ext4-4k kernel: dump_stack_lvl+0x68/0x90 Mar 28 04:50:41 extra-ext4-4k kernel: __lock_acquire+0x1eaf/0x2210 Mar 28 04:50:41 extra-ext4-4k kernel: ? __lock_acquire+0xc77/0x2210 Mar 28 04:50:41 extra-ext4-4k kernel: lock_acquire+0xd1/0x2e0 Mar 28 04:50:41 extra-ext4-4k kernel: ? put_cpu_partial+0x5f/0x1d0 Mar 28 04:50:41 extra-ext4-4k kernel: ? lock_acquire+0xe1/0x2e0 Mar 28 04:50:41 extra-ext4-4k kernel: put_cpu_partial+0x68/0x1d0 Mar 28 04:50:41 extra-ext4-4k kernel: ? put_cpu_partial+0x5f/0x1d0 Mar 28 04:50:41 extra-ext4-4k kernel: get_partial_node.part.0+0xde/0x400 Mar 28 04:50:41 extra-ext4-4k kernel: ___slab_alloc+0x361/0x13c0 Mar 28 04:50:41 extra-ext4-4k kernel: ? __alloc_object+0x2f/0x240 Mar 28 04:50:41 extra-ext4-4k kernel: ? mark_held_locks+0x40/0x70 Mar 28 04:50:41 extra-ext4-4k kernel: ? ___slab_alloc+0x701/0x13c0 Mar 28 04:50:41 extra-ext4-4k kernel: ? lockdep_hardirqs_on+0x78/0x100 Mar 28 04:50:41 extra-ext4-4k kernel: ? __alloc_object+0x2f/0x240 Mar 28 04:50:41 extra-ext4-4k kernel: ? __slab_alloc.isra.0+0x52/0xa0 Mar 28 04:50:41 extra-ext4-4k kernel: __slab_alloc.isra.0+0x52/0xa0 Mar 28 04:50:41 extra-ext4-4k kernel: ? __alloc_object+0x2f/0x240 Mar 28 04:50:41 extra-ext4-4k kernel: kmem_cache_alloc_noprof+0x1e3/0x430 Mar 28 04:50:41 extra-ext4-4k kernel: ? xas_alloc+0x9f/0xc0 Mar 28 04:50:41 extra-ext4-4k kernel: __alloc_object+0x2f/0x240 Mar 28 04:50:41 extra-ext4-4k kernel: __create_object+0x22/0x90 Mar 28 04:50:41 extra-ext4-4k kernel: ? xas_alloc+0x9f/0xc0 Mar 28 04:50:41 extra-ext4-4k kernel: kmem_cache_alloc_lru_noprof+0x337/0x430 Mar 28 04:50:41 extra-ext4-4k kernel: ? __lock_acquire+0x45d/0x2210 Mar 28 04:50:41 extra-ext4-4k kernel: ? stack_depot_save_flags+0x23/0x9d0 Mar 28 04:50:41 extra-ext4-4k kernel: xas_alloc+0x9f/0xc0 Mar 28 04:50:41 extra-ext4-4k kernel: xas_create+0x309/0x6f0 Mar 28 04:50:41 extra-ext4-4k kernel: xas_store+0x54/0x700 Mar 28 04:50:41 extra-ext4-4k kernel: __xa_cmpxchg+0xb9/0x140 Mar 28 04:50:41 extra-ext4-4k kernel: add_delayed_ref+0x11d/0xa50 [btrfs] Mar 28 04:50:41 extra-ext4-4k kernel: btrfs_alloc_tree_block+0x3ea/0x5a0 [btrfs] Mar 28 04:50:41 extra-ext4-4k kernel: split_leaf+0x167/0x6d0 [btrfs] Mar 28 04:50:41 extra-ext4-4k kernel: setup_leaf_for_split+0x19f/0x200 [btrfs] Mar 28 04:50:41 extra-ext4-4k kernel: btrfs_split_item+0x21/0x50 [btrfs] Mar 28 04:50:41 extra-ext4-4k kernel: btrfs_del_csums+0x270/0x3a0 [btrfs] Mar 28 04:50:41 extra-ext4-4k kernel: ? btrfs_csum_root+0x83/0xb0 [btrfs] Mar 28 04:50:41 extra-ext4-4k kernel: __btrfs_free_extent.isra.0+0x5fb/0xcc0 [btrfs] Mar 28 04:50:41 extra-ext4-4k kernel: __btrfs_run_delayed_refs+0x51d/0xf40 [btrfs] Mar 28 04:50:41 extra-ext4-4k kernel: btrfs_run_delayed_refs+0x3d/0x110 [btrfs] Mar 28 04:50:41 extra-ext4-4k kernel: btrfs_commit_transaction+0x8f/0xee0 [btrfs] Mar 28 04:50:41 extra-ext4-4k kernel: ? btrfs_init_block_rsv+0x51/0x60 [btrfs] Mar 28 04:50:41 extra-ext4-4k kernel: ? start_transaction+0x22c/0xaa0 [btrfs] Mar 28 04:50:41 extra-ext4-4k kernel: transaction_kthread+0x152/0x1b0 [btrfs] Mar 28 04:50:41 extra-ext4-4k kernel: ? __pfx_transaction_kthread+0x10/0x10 [btrfs] Mar 28 04:50:41 extra-ext4-4k kernel: kthread+0x107/0x250 Mar 28 04:50:41 extra-ext4-4k kernel: ? find_held_lock+0x2b/0x80 Mar 28 04:50:41 extra-ext4-4k kernel: ? ret_from_fork+0x17/0x50 Mar 28 04:50:41 extra-ext4-4k kernel: ? ret_from_fork+0x17/0x50 Mar 28 04:50:41 extra-ext4-4k kernel: ? lock_release+0x17d/0x2c0 Mar 28 04:50:41 extra-ext4-4k kernel: ? __pfx_kthread+0x10/0x10 Mar 28 04:50:41 extra-ext4-4k kernel: ? __pfx_kthread+0x10/0x10 Mar 28 04:50:41 extra-ext4-4k kernel: ret_from_fork+0x2d/0x50 Mar 28 04:50:41 extra-ext4-4k kernel: ? __pfx_kthread+0x10/0x10 Mar 28 04:50:41 extra-ext4-4k kernel: ret_from_fork_asm+0x1a/0x30 Mar 28 04:50:41 extra-ext4-4k kernel: </TASK> Mar 28 05:04:32 extra-ext4-4k kernel: page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x20889c pfn:0x4a3e Mar 28 05:04:32 extra-ext4-4k kernel: flags: 0xffffc000000000(node=0|zone=1|lastcpupid=0x1ffff) Mar 28 05:04:32 extra-ext4-4k kernel: raw: 00ffffc000000000 fffffc63c041d448 ffff978d7bc347f0 0000000000000000 Mar 28 05:04:32 extra-ext4-4k kernel: raw: 000000000020889c 0000000000000000 00000000ffffffff 0000000000000000 Mar 28 05:04:32 extra-ext4-4k kernel: page dumped because: Not a kmalloc allocation Mar 28 05:31:13 extra-ext4-4k kernel: page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x498b96 pfn:0x76f4 Mar 28 05:31:13 extra-ext4-4k kernel: flags: 0xffffc000000000(node=0|zone=1|lastcpupid=0x1ffff) Mar 28 05:31:13 extra-ext4-4k kernel: raw: 00ffffc000000000 fffffc63c01d9308 fffffc63c01df648 0000000000000000 Mar 28 05:31:13 extra-ext4-4k kernel: raw: 0000000000498b96 0000000000000000 00000000ffffffff 0000000000000000 Mar 28 05:31:13 extra-ext4-4k kernel: page dumped because: Not a kmalloc allocation Mar 28 05:57:09 extra-ext4-4k kernel: EXT4-fs error (device loop5): ext4_get_first_dir_block:3538: inode #5174: comm fsstress: directory missing '.' Mar 28 06:04:43 extra-ext4-4k kernel: EXT4-fs warning (device loop5): ext4_empty_dir:3088: inode #5176: comm fsstress: directory missing '.' Mar 28 06:05:19 extra-ext4-4k kernel: page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x243117 pfn:0x104ddb Mar 28 06:05:19 extra-ext4-4k kernel: flags: 0x57fffc000000000(node=1|zone=2|lastcpupid=0x1ffff) Mar 28 06:05:19 extra-ext4-4k kernel: raw: 057fffc000000000 fffffc63c4136fc8 ffff978d7bcb4970 0000000000000000 Mar 28 06:05:19 extra-ext4-4k kernel: raw: 0000000000243117 0000000000000000 00000000ffffffff 0000000000000000 Mar 28 06:05:19 extra-ext4-4k kernel: page dumped because: Not a kmalloc allocation Mar 28 06:15:16 extra-ext4-4k kernel: EXT4-fs warning (device loop5): ext4_empty_dir:3088: inode #5176: comm fsstress: directory missing '.' Mar 28 06:23:04 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5173: comm fsstress: checksumming directory block 0 Mar 28 06:23:15 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5173: comm fsstress: checksumming directory block 0 Mar 28 06:23:23 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5173: comm fsstress: checksumming directory block 0 Mar 28 06:23:28 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5173: comm fsstress: checksumming directory block 0 Mar 28 06:23:43 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5173: comm fsstress: checksumming directory block 0 Mar 28 06:24:02 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5173: comm fsstress: checksumming directory block 0 Mar 28 06:24:35 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5173: comm fsstress: checksumming directory block 0 Mar 28 06:30:04 extra-ext4-4k kernel: EXT4-fs warning (device loop5): ext4_empty_dir:3088: inode #5173: comm fsstress: directory missing '.' Mar 28 06:32:30 extra-ext4-4k kernel: EXT4-fs warning (device loop5): ext4_empty_dir:3088: inode #5188: comm fsstress: directory missing '.' Mar 28 06:32:39 extra-ext4-4k kernel: EXT4-fs warning (device loop5): ext4_empty_dir:3088: inode #5176: comm fsstress: directory missing '.' Mar 28 06:38:54 extra-ext4-4k kernel: EXT4-fs warning (device loop5): ext4_empty_dir:3088: inode #5173: comm fsstress: directory missing '.' Mar 28 06:41:37 extra-ext4-4k kernel: EXT4-fs warning (device loop5): ext4_empty_dir:3088: inode #5187: comm fsstress: directory missing '.' Mar 28 06:42:05 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5173: comm fsstress: checksumming directory block 0 Mar 28 06:42:06 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5173: comm fsstress: checksumming directory block 0 Mar 28 06:42:22 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5173: comm fsstress: checksumming directory block 0 Mar 28 06:42:38 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5173: comm fsstress: checksumming directory block 0 Mar 28 06:42:42 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5173: comm fsstress: checksumming directory block 0 Mar 28 06:42:53 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5173: comm fsstress: checksumming directory block 0 Mar 28 06:42:54 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5173: comm fsstress: checksumming directory block 0 Mar 28 06:43:02 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5173: comm fsstress: checksumming directory block 0 Mar 28 06:43:12 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5173: comm fsstress: checksumming directory block 0 Mar 28 06:43:15 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5173: comm fsstress: checksumming directory block 0 Mar 28 06:53:28 extra-ext4-4k kernel: EXT4-fs warning (device loop5): ext4_empty_dir:3088: inode #5188: comm fsstress: directory missing '.' Mar 28 06:54:36 extra-ext4-4k kernel: EXT4-fs warning (device loop5): ext4_empty_dir:3088: inode #5176: comm fsstress: directory missing '.' Mar 28 06:55:07 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5173: comm fsstress: checksumming directory block 0 Mar 28 06:55:09 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5173: comm fsstress: checksumming directory block 0 Mar 28 06:55:12 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5173: comm fsstress: checksumming directory block 0 Mar 28 07:04:21 extra-ext4-4k kernel: EXT4-fs error (device loop5): ext4_get_first_dir_block:3538: inode #5173: comm fsstress: directory missing '.' Mar 28 07:11:04 extra-ext4-4k kernel: EXT4-fs warning (device loop5): ext4_empty_dir:3088: inode #5173: comm fsstress: directory missing '.' Mar 28 07:13:11 extra-ext4-4k kernel: EXT4-fs error (device loop5): ext4_get_first_dir_block:3538: inode #5174: comm fsstress: directory missing '.' Mar 28 07:15:45 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5176: comm fsstress: checksumming directory block 0 Mar 28 07:15:49 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5176: comm fsstress: checksumming directory block 0 Mar 28 07:15:51 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5176: comm fsstress: checksumming directory block 0 Mar 28 07:15:52 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5176: comm fsstress: checksumming directory block 0 Mar 28 07:16:00 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5176: comm fsstress: checksumming directory block 0 Mar 28 07:16:41 extra-ext4-4k kernel: EXT4-fs error (device loop5): ext4_get_first_dir_block:3538: inode #5187: comm fsstress: directory missing '.' Mar 28 07:24:00 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5176: comm fsstress: checksumming directory block 0 Mar 28 07:24:31 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5176: comm fsstress: checksumming directory block 0 Mar 28 07:25:40 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #8703: comm fsstress: checksumming directory block 0 Mar 28 07:25:47 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #8703: comm fsstress: checksumming directory block 0 Mar 28 07:25:50 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #8703: comm fsstress: checksumming directory block 0 Mar 28 07:26:18 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #8684: comm fsstress: checksumming directory block 0 Mar 28 07:41:04 extra-ext4-4k kernel: EXT4-fs error (device loop5): ext4_get_first_dir_block:3538: inode #5188: comm fsstress: directory missing '.' Mar 28 07:41:11 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5174: comm fsstress: checksumming directory block 0 Mar 28 07:44:41 extra-ext4-4k kernel: EXT4-fs error (device loop5): ext4_get_first_dir_block:3538: inode #5187: comm fsstress: directory missing '.' Mar 28 07:47:20 extra-ext4-4k kernel: EXT4-fs warning (device loop5): ext4_empty_dir:3088: inode #5187: comm fsstress: directory missing '.' Mar 28 07:47:28 extra-ext4-4k kernel: EXT4-fs warning (device loop5): ext4_empty_dir:3088: inode #5187: comm fsstress: directory missing '.' Mar 28 07:47:56 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5176: comm fsstress: checksumming directory block 0 Mar 28 07:49:05 extra-ext4-4k kernel: EXT4-fs warning (device loop5): ext4_empty_dir:3088: inode #5187: comm fsstress: directory missing '.' Mar 28 07:53:26 extra-ext4-4k kernel: EXT4-fs warning (device loop5): ext4_empty_dir:3088: inode #5173: comm fsstress: directory missing '.' Mar 28 08:16:43 extra-ext4-4k kernel: EXT4-fs error (device loop5): ext4_find_extent:938: inode #1104560: comm fsstress: pblk 4932229 bad header/extent: invalid magic - magic 8383, entries 33667, max 33667(0), depth 33667(0) Mar 28 08:16:43 extra-ext4-4k kernel: EXT4-fs error (device loop5): ext4_find_extent:938: inode #1104560: comm fsstress: pblk 4932229 bad header/extent: invalid magic - magic 8383, entries 33667, max 33667(0), depth 33667(0) Mar 28 08:16:43 extra-ext4-4k kernel: EXT4-fs error (device loop5): ext4_find_extent:938: inode #1104560: comm fsstress: pblk 4932229 bad header/extent: invalid magic - magic 8383, entries 33667, max 33667(0), depth 33667(0) Mar 28 08:16:43 extra-ext4-4k kernel: EXT4-fs error (device loop5): ext4_find_extent:938: inode #1104560: comm fsstress: pblk 4932229 bad header/extent: invalid magic - magic 8383, entries 33667, max 33667(0), depth 33667(0) Mar 28 08:16:43 extra-ext4-4k kernel: EXT4-fs error (device loop5): ext4_find_extent:938: inode #1104560: comm fsstress: pblk 4932229 bad header/extent: invalid magic - magic 8383, entries 33667, max 33667(0), depth 33667(0) Mar 28 08:16:43 extra-ext4-4k kernel: EXT4-fs error (device loop5): ext4_find_extent:938: inode #1104560: comm fsstress: pblk 4932229 bad header/extent: invalid magic - magic 8383, entries 33667, max 33667(0), depth 33667(0) Mar 28 08:16:43 extra-ext4-4k kernel: EXT4-fs error (device loop5): ext4_find_extent:938: inode #1104560: comm fsstress: pblk 4932229 bad header/extent: invalid magic - magic 8383, entries 33667, max 33667(0), depth 33667(0) Mar 28 08:16:43 extra-ext4-4k kernel: EXT4-fs error (device loop5): ext4_find_extent:938: inode #1104560: comm fsstress: pblk 4932229 bad header/extent: invalid magic - magic 8383, entries 33667, max 33667(0), depth 33667(0) Mar 28 08:16:43 extra-ext4-4k kernel: EXT4-fs error (device loop5): ext4_find_extent:938: inode #1104560: comm fsstress: pblk 4932229 bad header/extent: invalid magic - magic 8383, entries 33667, max 33667(0), depth 33667(0) Mar 28 08:16:43 extra-ext4-4k kernel: EXT4-fs error (device loop5): ext4_find_extent:938: inode #1104560: comm fsstress: pblk 4932229 bad header/extent: invalid magic - magic 8383, entries 33667, max 33667(0), depth 33667(0) Mar 28 08:19:26 extra-ext4-4k kernel: EXT4-fs error: 6 callbacks suppressed Mar 28 08:19:26 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5173: comm fsstress: checksumming directory block 0 Mar 28 08:21:37 extra-ext4-4k kernel: EXT4-fs warning (device loop5): ext4_empty_dir:3088: inode #5173: comm fsstress: directory missing '.' Mar 28 08:28:17 extra-ext4-4k kernel: EXT4-fs warning (device loop5): ext4_empty_dir:3088: inode #5173: comm fsstress: directory missing '.' Mar 28 08:30:17 extra-ext4-4k kernel: EXT4-fs error (device loop5): ext4_find_extent:938: inode #1104560: comm fsstress: pblk 4932229 bad header/extent: invalid magic - magic 8383, entries 33667, max 33667(0), depth 33667(0) Mar 28 08:31:02 extra-ext4-4k kernel: EXT4-fs warning (device loop5): ext4_empty_dir:3088: inode #5176: comm fsstress: directory missing '.' Mar 28 08:32:21 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #8699: comm fsstress: checksumming directory block 0 Mar 28 08:32:23 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #8699: comm fsstress: checksumming directory block 0 Mar 28 08:32:24 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #8699: comm fsstress: checksumming directory block 0 Mar 28 08:32:31 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #8699: comm fsstress: checksumming directory block 0 Mar 28 08:32:36 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #8699: comm fsstress: checksumming directory block 0 Mar 28 08:32:43 extra-ext4-4k kernel: EXT4-fs warning (device loop5): ext4_empty_dir:3088: inode #5176: comm fsstress: directory missing '.' Mar 28 08:34:47 extra-ext4-4k kernel: EXT4-fs warning (device loop5): ext4_empty_dir:3088: inode #5174: comm fsstress: directory missing '.' Mar 28 08:34:58 extra-ext4-4k kernel: EXT4-fs warning (device loop5): ext4_empty_dir:3088: inode #5173: comm fsstress: directory missing '.' Mar 28 08:35:01 extra-ext4-4k kernel: EXT4-fs warning (device loop5): ext4_empty_dir:3088: inode #5187: comm fsstress: directory missing '.' Mar 28 08:37:11 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #8699: comm fsstress: checksumming directory block 0 Mar 28 08:37:12 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #8699: comm fsstress: checksumming directory block 0 Mar 28 08:37:14 extra-ext4-4k kernel: EXT4-fs warning (device loop5): ext4_empty_dir:3088: inode #5188: comm fsstress: directory missing '.' Mar 28 08:37:17 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #8699: comm fsstress: checksumming directory block 0 Mar 28 08:39:32 extra-ext4-4k kernel: EXT4-fs warning (device loop5): ext4_empty_dir:3088: inode #5174: comm fsstress: directory missing '.' Mar 28 08:40:52 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5176: comm fsstress: checksumming directory block 0 Mar 28 08:40:55 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5176: comm fsstress: checksumming directory block 0 Mar 28 08:41:03 extra-ext4-4k kernel: EXT4-fs warning (device loop5): ext4_empty_dir:3088: inode #5173: comm fsstress: directory missing '.' Mar 28 08:54:04 extra-ext4-4k kernel: EXT4-fs error (device loop5): ext4_get_first_dir_block:3538: inode #5187: comm fsstress: directory missing '.' Mar 28 08:58:02 extra-ext4-4k kernel: EXT4-fs warning (device loop5): ext4_empty_dir:3088: inode #5188: comm fsstress: directory missing '.' Mar 28 09:00:10 extra-ext4-4k kernel: EXT4-fs error (device loop5): ext4_get_first_dir_block:3538: inode #5174: comm fsstress: directory missing '.' Mar 28 09:01:30 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5174: comm fsstress: checksumming directory block 0 Mar 28 09:04:55 extra-ext4-4k kernel: EXT4-fs warning (device loop5): ext4_empty_dir:3088: inode #5176: comm fsstress: directory missing '.' Mar 28 09:05:48 extra-ext4-4k kernel: EXT4-fs warning (device loop5): ext4_empty_dir:3088: inode #5188: comm fsstress: directory missing '.' Mar 28 09:07:16 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5176: comm fsstress: checksumming directory block 0 Mar 28 09:07:21 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5176: comm fsstress: checksumming directory block 0 Mar 28 09:07:31 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5176: comm fsstress: checksumming directory block 0 Mar 28 09:07:33 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5173: comm fsstress: checksumming directory block 0 Mar 28 09:07:34 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5173: comm fsstress: checksumming directory block 0 Mar 28 09:07:42 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5173: comm fsstress: checksumming directory block 0 Mar 28 09:07:43 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5173: comm fsstress: checksumming directory block 0 Mar 28 09:07:49 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5173: comm fsstress: checksumming directory block 0 Mar 28 09:13:23 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5188: comm fsstress: checksumming directory block 0 Mar 28 09:13:44 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5188: comm fsstress: checksumming directory block 0 Mar 28 09:13:56 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5188: comm fsstress: checksumming directory block 0 Mar 28 09:14:06 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5188: comm fsstress: checksumming directory block 0 Mar 28 09:14:33 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5187: comm fsstress: checksumming directory block 0 Mar 28 09:14:35 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5187: comm fsstress: checksumming directory block 0 Mar 28 09:14:50 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5187: comm fsstress: checksumming directory block 0 Mar 28 09:14:51 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5187: comm fsstress: checksumming directory block 0 Mar 28 09:14:53 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5187: comm fsstress: checksumming directory block 0 Mar 28 09:14:54 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5187: comm fsstress: checksumming directory block 0 Mar 28 09:14:55 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5187: comm fsstress: checksumming directory block 0 Mar 28 09:14:56 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5187: comm fsstress: checksumming directory block 0 Mar 28 09:14:57 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5187: comm fsstress: checksumming directory block 0 Mar 28 09:15:00 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5187: comm fsstress: checksumming directory block 0 Mar 28 09:15:11 extra-ext4-4k kernel: EXT4-fs error (device loop5): __ext4_find_entry:1626: inode #5187: comm fsstress: checksumming directory block 0 Mar 28 09:16:55 extra-ext4-4k kernel: EXT4-fs error (device loop5): ext4_find_extent:938: inode #1104560: comm fsstress: pblk 4932229 bad header/extent: invalid magic - magic 8383, entries 33667, max 33667(0), depth 33667(0) Mar 28 09:17:22 extra-ext4-4k kernel: NOHZ tick-stop error: local softirq work is pending, handler #10!!! ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [linux-next:master] [block/bdev] 3c20917120: BUG:sleeping_function_called_from_invalid_context_at_mm/util.c 2025-03-28 9:47 ` Luis Chamberlain @ 2025-03-28 19:09 ` Luis Chamberlain 2025-03-29 0:08 ` Luis Chamberlain 0 siblings, 1 reply; 31+ messages in thread From: Luis Chamberlain @ 2025-03-28 19:09 UTC (permalink / raw) To: Jan Kara, Kefeng Wang, Sebastian Andrzej Siewior, David Bueso, Tso Ted, Ritesh Harjani Cc: Johannes Weiner, Oliver Sang, Matthew Wilcox, David Hildenbrand, Alistair Popple, linux-mm, Christian Brauner, Hannes Reinecke, oe-lkp, lkp, John Garry, linux-block, ltp, Pankaj Raghav, Daniel Gomez, Dave Chinner, gost.dev, mcgrof On Fri, Mar 28, 2025 at 02:48:00AM -0700, Luis Chamberlain wrote: > On Thu, Mar 27, 2025 at 09:21:30PM -0700, Luis Chamberlain wrote: > > Would the extra ref check added via commit 060913999d7a9e50 ("mm: > > migrate: support poisoned recover from migrate folio") make the removal > > of the spin lock safe now given all the buffers are locked from the > > folio? This survives some basic sanity checks on my end with > > generic/750 against ext4 and also filling a drive at the same time with > > fio. I have a feeling is we are not sure, do we have a reproducer for > > the issue reported through ebdf4de5642fb6 ("mm: migrate: fix reference > > check race between __find_get_block() and migration")? I suspect the > > answer is no. Sebastian, David, is there a reason CONFIG_DEBUG_ATOMIC_SLEEP=y won't trigger a atomic sleeping context warning when cond_resched() is used? Syzbot and 0-day had ways to reproduce it a kernel warning under these conditions, but this config didn't, and require dan explicit might_sleep() CONFIG_PREEMPT_BUILD=y CONFIG_ARCH_HAS_PREEMPT_LAZY=y # CONFIG_PREEMPT_NONE is not set # CONFIG_PREEMPT_VOLUNTARY is not set CONFIG_PREEMPT=y # CONFIG_PREEMPT_LAZY is not set # CONFIG_PREEMPT_RT is not set CONFIG_PREEMPT_COUNT=y CONFIG_PREEMPTION=y CONFIG_PREEMPT_DYNAMIC=y CONFIG_PREEMPT_RCU=y CONFIG_HAVE_PREEMPT_DYNAMIC=y CONFIG_HAVE_PREEMPT_DYNAMIC_CALL=y CONFIG_PREEMPT_NOTIFIERS=y CONFIG_DEBUG_PREEMPT=y CONFIG_PREEMPTIRQ_TRACEPOINTS=y # CONFIG_PREEMPT_TRACER is not set # CONFIG_PREEMPTIRQ_DELAY_TEST is not set Are there some preemption configs under which cond_resched() won't trigger a kernel splat where expected so the only thing I can think of is perhaps some preempt configs don't implicate a sleep? If true, instead of adding might_sleep() to one piece of code (in this case foio_mc_copy()) I wonder if instead just adding it to cond_resched() may be useful. Note that the issue in question wouldn't trigger at all with ext4, that some reports suggset it happened with btrfs (0-day) with LTP, or another test from syzbot was just coincidence on any filesystem, the only way to reproduce this really was by triggering compaction with the block device cache and hitting compaction as we're now enabling large folios with the block device cache, and we've narrowed that down to a simple reproducer of running dd if=/dev/zero of=/dev/vde bs=1024M count=1024. and by adding the might_sleep() on folio_mc_copy() Then as for the issue we're analzying, now that I get back home I think its important to highlight then that generic/750 seems likely able to reproduce the original issue reported by commit ebdf4de5642fb6 ("mm: migrate: fix reference check race between __find_get_block() and migration") and that it takes about 3 hours to reproduce, which requires reverting that commit which added the spin lock: Mar 28 03:36:37 extra-ext4-4k unknown: run fstests generic/750 at 2025-03-28 03:36:37 <-- snip --> Mar 28 05:57:09 extra-ext4-4k kernel: EXT4-fs error (device loop5): ext4_get_first_dir_block:3538: inode #5174: comm fsstress: directory missing '.' Jan, can you confirm if the symptoms match the original report? It would be good for us to see if running the newly proposed generic/764 I am proposing [0] can reproduce that corruption faster than 3 hours. If we have a reproducer we can work on evaluating a fix for both the older ext4 issue reported by commit ebdf4de5642fb6 and also remove the spin lock from page migration to support large folios. And lastly, can __find_get_block() avoid running in case of page migration? Do we have semantics from a filesystem perspective to prevent work in filesystems going on when page migration on a folio is happening in atomic context? If not, do we need it? [0] https://lore.kernel.org/all/20250326185101.2237319-1-mcgrof@kernel.org/T/#u Luis ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [linux-next:master] [block/bdev] 3c20917120: BUG:sleeping_function_called_from_invalid_context_at_mm/util.c 2025-03-28 19:09 ` Luis Chamberlain @ 2025-03-29 0:08 ` Luis Chamberlain 2025-03-29 1:06 ` Luis Chamberlain 2025-03-31 7:45 ` Sebastian Andrzej Siewior 0 siblings, 2 replies; 31+ messages in thread From: Luis Chamberlain @ 2025-03-29 0:08 UTC (permalink / raw) To: Jan Kara, Kefeng Wang, Sebastian Andrzej Siewior, David Bueso, Tso Ted, Ritesh Harjani Cc: Johannes Weiner, Oliver Sang, Matthew Wilcox, David Hildenbrand, Alistair Popple, linux-mm, Christian Brauner, Hannes Reinecke, oe-lkp, lkp, John Garry, linux-block, ltp, Pankaj Raghav, Daniel Gomez, Dave Chinner, gost.dev On Fri, Mar 28, 2025 at 12:09:06PM -0700, Luis Chamberlain wrote: > On Fri, Mar 28, 2025 at 02:48:00AM -0700, Luis Chamberlain wrote: > > On Thu, Mar 27, 2025 at 09:21:30PM -0700, Luis Chamberlain wrote: > > > Would the extra ref check added via commit 060913999d7a9e50 ("mm: > > > migrate: support poisoned recover from migrate folio") make the removal > > > of the spin lock safe now given all the buffers are locked from the > > > folio? This survives some basic sanity checks on my end with > > > generic/750 against ext4 and also filling a drive at the same time with > > > fio. I have a feeling is we are not sure, do we have a reproducer for > > > the issue reported through ebdf4de5642fb6 ("mm: migrate: fix reference > > > check race between __find_get_block() and migration")? I suspect the > > > answer is no. > > Sebastian, David, is there a reason CONFIG_DEBUG_ATOMIC_SLEEP=y won't > trigger a atomic sleeping context warning when cond_resched() is used? > Syzbot and 0-day had ways to reproduce it a kernel warning under these > conditions, but this config didn't, and require dan explicit might_sleep() > > CONFIG_PREEMPT_BUILD=y > CONFIG_ARCH_HAS_PREEMPT_LAZY=y > # CONFIG_PREEMPT_NONE is not set > # CONFIG_PREEMPT_VOLUNTARY is not set > CONFIG_PREEMPT=y > # CONFIG_PREEMPT_LAZY is not set > # CONFIG_PREEMPT_RT is not set > CONFIG_PREEMPT_COUNT=y > CONFIG_PREEMPTION=y > CONFIG_PREEMPT_DYNAMIC=y > CONFIG_PREEMPT_RCU=y > CONFIG_HAVE_PREEMPT_DYNAMIC=y > CONFIG_HAVE_PREEMPT_DYNAMIC_CALL=y > CONFIG_PREEMPT_NOTIFIERS=y > CONFIG_DEBUG_PREEMPT=y > CONFIG_PREEMPTIRQ_TRACEPOINTS=y > # CONFIG_PREEMPT_TRACER is not set > # CONFIG_PREEMPTIRQ_DELAY_TEST is not set > > Are there some preemption configs under which cond_resched() won't > trigger a kernel splat where expected so the only thing I can think of > is perhaps some preempt configs don't implicate a sleep? If true, > instead of adding might_sleep() to one piece of code (in this case > foio_mc_copy()) I wonder if instead just adding it to cond_resched() may > be useful. I think the answer to the above is "no". And it took me quite some more testing with the below patch to convince myself of that. Essentially, to trigger the cond_resched() atomic context warning kernel warning we'd need to be in atomic context, and that today we can get there through folio_mc_copy() through large folios. Today the only atomic context we know which would end up in page migration and folio_mc_copy() would be with buffer-head filesystems which support large folios and which use buffer_migrate_folio_norefs() for their migrate_folio() callback. The patch which we added which enabled the block layer to support large folios did this only for cases where the block size of the backing device is > PAGE_SIZE. So for instance your qemu guest would need to have a logical block size larer than 4096 on x86_64. To be clear, ext4 cannot possibly trigger this. No filesystem can trigger this *case* other than the block device cache, and that is only possible if block devices have larger block sizes. The whole puzzle above about cond_resched() not rigger atomic warning is because in fact, although buffer_migrate_folio_norefs() *does* always use atomic context to call filemap_migrate_folio(), in practice I'm not seeing it, that is, we likley bail before we even call folio_mc_copy(). So for instance we can see: Mar 28 23:22:04 extra-ext4-4k kernel: __buffer_migrate_folio() in_atomic: 1 Mar 28 23:22:04 extra-ext4-4k kernel: __buffer_migrate_folio() in_atomic: 1 Mar 28 23:23:11 extra-ext4-4k kernel: large folios on folio_mc_copy(): 512 in_atomic(): 0 Mar 28 23:23:11 extra-ext4-4k kernel: large folios on folio_mc_copy(): in_atomic(): 0 calling cond_resched() Mar 28 23:23:11 extra-ext4-4k kernel: large folios on folio_mc_copy(): in_atomic(): 0 calling cond_resched() diff --git a/block/bdev.c b/block/bdev.c index 4844d1e27b6f..1db9edfc4bc1 100644 --- a/block/bdev.c +++ b/block/bdev.c @@ -147,6 +147,11 @@ static void set_init_blocksize(struct block_device *bdev) break; bsize <<= 1; } + + if (bsize > PAGE_SIZE) + printk("%s: LBS device: mapping_set_folio_min_order(%u): %u\n", + bdev->bd_disk->disk_name, get_order(bsize), bsize); + BD_INODE(bdev)->i_blkbits = blksize_bits(bsize); mapping_set_folio_min_order(BD_INODE(bdev)->i_mapping, get_order(bsize)); diff --git a/mm/migrate.c b/mm/migrate.c index f3ee6d8d5e2e..210df4970573 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -851,6 +851,7 @@ static int __buffer_migrate_folio(struct address_space *mapping, recheck_buffers: busy = false; spin_lock(&mapping->i_private_lock); + printk("__buffer_migrate_folio() in_atomic: %d\n", in_atomic()); bh = head; do { if (atomic_read(&bh->b_count)) { @@ -871,6 +872,8 @@ static int __buffer_migrate_folio(struct address_space *mapping, } } + if (check_refs) + printk("__buffer_migrate_folio() calling filemap_migrate_folio() in_atomic: %d\n", in_atomic()); rc = filemap_migrate_folio(mapping, dst, src, mode); if (rc != MIGRATEPAGE_SUCCESS) goto unlock_buffers; diff --git a/mm/util.c b/mm/util.c index 448117da071f..61c76712d4bb 100644 --- a/mm/util.c +++ b/mm/util.c @@ -735,11 +735,15 @@ int folio_mc_copy(struct folio *dst, struct folio *src) long nr = folio_nr_pages(src); long i = 0; + if (nr > 1) + printk("large folios on folio_mc_copy(): %lu in_atomic(): %d\n", nr, in_atomic()); + for (;;) { if (copy_mc_highpage(folio_page(dst, i), folio_page(src, i))) return -EHWPOISON; if (++i == nr) break; + printk("large folios on folio_mc_copy(): in_atomic(): %d calling cond_resched()\n", in_atomic()); cond_resched(); } And so effectively, it is true, cond_resched() is not in atomic context above, even though filemap_migrate_folio() is certainly being called in atomic context. What changes in between is folios likely won't migrate due to later checks in filemap_migrate_folio() like the new ref check, and instead we end up with page migraiton later of a huge page, and *that* is not in atomic context. So, to be clear, I *still* cannot reproduce the original reports, even though in theory it is evident how buffer_migrate_folio_norefs() *can* call filemap_migrate_folio() in atomic context. How 0-day and syzbot triggered this *without* a large block size block device is perplexing to me, if it is true that one was not used. How we still can't reproduce in_atomic() context in folio_mc_copy() is another fun mystery. That is to say, I can't see how the existing code could regress here. Given only the only buffer-head filesystem which enables large folios is the pseudo block device cache filesystem, and you'll only get LBS devices if the logical block size > PAGE_SIZE. Despite all this, we have two separate reports and no clear information if this was using a large block device enabled or not, and so given the traces above to help root out more bugs with large folios we should just proactively add might_sleep() to __migrate_folio(). I'll send a patch for that, that'll enhance our test coverage. The reason why we likely are having hard time to reproduce the issue is this new check: /* Check whether src does not have extra refs before we do more work */ if (folio_ref_count(src) != expected_count) return -EAGAIN; . So, moving on, I think what's best is to see how we can get __find_get_block() to not chug on during page migration. Luis ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [linux-next:master] [block/bdev] 3c20917120: BUG:sleeping_function_called_from_invalid_context_at_mm/util.c 2025-03-29 0:08 ` Luis Chamberlain @ 2025-03-29 1:06 ` Luis Chamberlain 2025-03-31 7:45 ` Sebastian Andrzej Siewior 1 sibling, 0 replies; 31+ messages in thread From: Luis Chamberlain @ 2025-03-29 1:06 UTC (permalink / raw) To: Jan Kara, Kefeng Wang, Sebastian Andrzej Siewior, David Bueso, Tso Ted, Ritesh Harjani Cc: Johannes Weiner, Oliver Sang, Matthew Wilcox, David Hildenbrand, Alistair Popple, linux-mm, Christian Brauner, Hannes Reinecke, oe-lkp, lkp, John Garry, linux-block, ltp, Pankaj Raghav, Daniel Gomez, Dave Chinner, gost.dev On Fri, Mar 28, 2025 at 05:08:40PM -0700, Luis Chamberlain wrote: > So, moving on, I think what's best is to see how we can get __find_get_block() > to not chug on during page migration. Something like this maybe? Passes initial 10 minutes of generic/750 on ext4 while also blasting an LBS device with dd. I'll let it soak. The second patch is what requieres more eyeballs / suggestions / ideas. From 86b2315f3c80dd4562a1a0fa0734921d3e92398f Mon Sep 17 00:00:00 2001 From: Luis Chamberlain <mcgrof@kernel.org> Date: Fri, 28 Mar 2025 17:12:48 -0700 Subject: [PATCH 1/3] mm/migrate: add might_sleep() on __migrate_folio() When we do page migration of large folios folio_mc_copy() can cond_resched() *iff* we are on a large folio. There's a hairy bug reported by both 0-day [0] and syzbot [1] where it has been detected we can call folio_mc_copy() in atomic context. While, technically speaking that should in theory be only possible today from buffer-head filesystems using buffer_migrate_folio_norefs() on page migration the only buffer-head large folio filesystem -- the block device cache, and so with block devices with large block sizes. However tracing shows that folio_mc_copy() *isn't* being called as often as we'd expect from buffer_migrate_folio_norefs() path as we're likely bailing early now thanks to the check added by commit 060913999d7a ("mm: migrate: support poisoned recover from migrate folio"). *Most* folio_mc_copy() calls in turn end up *not* being in atomic context, and so we won't hit a splat when using: CONFIG_PROVE_LOCKING=y CONFIG_DEBUG_ATOMIC_SLEEP=y But we *want* to help proactively find callers of __migrate_folio() in atomic context, so make might_sleep() explicit to help us root out large folio atomic callers of migrate_folio(). Link: https://lkml.kernel.org/r/202503101536.27099c77-lkp@intel.com # [0] Link: https://lkml.kernel.org/r/67e57c41.050a0220.2f068f.0033.GAE@google.com # [1] Link: https://lkml.kernel.org/r/Z-c6BqCSmAnNxb57@bombadil.infradead.org # [2] Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> --- mm/migrate.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/mm/migrate.c b/mm/migrate.c index f3ee6d8d5e2e..712ddd11f3f0 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -751,6 +751,8 @@ static int __migrate_folio(struct address_space *mapping, struct folio *dst, { int rc, expected_count = folio_expected_refs(mapping, src); + might_sleep(); + /* Check whether src does not have extra refs before we do more work */ if (folio_ref_count(src) != expected_count) return -EAGAIN; -- 2.47.2 From 561e94951fce481bb2e5917230bec7008c131d9a Mon Sep 17 00:00:00 2001 From: Luis Chamberlain <mcgrof@kernel.org> Date: Fri, 28 Mar 2025 17:44:10 -0700 Subject: [PATCH 2/3] fs/buffer: avoid getting buffer if it is folio migration candidate Avoid giving a way a buffer with __find_get_block_slow() if the folio may be a folio migration candidate. We do this as an alternative to the issue fixed by commit ebdf4de5642fb6 ("mm: migrate: fix reference check race between __find_get_block() and migration"), given we've determined that we should avoid requiring folio migration callers from holding a spin lock while calling __migrate_folio(). This alternative simply avoids completing __find_get_block_slow() on folio migration candidates to let us later rip out the spin_lock() held on the buffer_migrate_folio_norefs() path. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> --- fs/buffer.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/fs/buffer.c b/fs/buffer.c index c7abb4a029dc..6e2c3837a202 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -208,6 +208,12 @@ __find_get_block_slow(struct block_device *bdev, sector_t block) head = folio_buffers(folio); if (!head) goto out_unlock; + + if (folio_test_lru(folio) && + folio_test_locked(folio) && + !folio_test_writeback(folio)) + goto out_unlock; + bh = head; do { if (!buffer_mapped(bh)) -- 2.47.2 From af6963b73a8406162e6c2223fae600a799402e2b Mon Sep 17 00:00:00 2001 From: Luis Chamberlain <mcgrof@kernel.org> Date: Fri, 28 Mar 2025 17:51:39 -0700 Subject: [PATCH 3/3] mm/migrate: avoid atomic context on buffer_migrate_folio_norefs() migration The buffer_migrate_folio_norefs() should avoid holding the spin lock held in order to ensure we can support large folios. The prior commit "fs/buffer: avoid getting buffer if it is folio migration candidate" ripped out the only rationale for having the atomic context, so we can remove the spin lock call now. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> --- mm/migrate.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/mm/migrate.c b/mm/migrate.c index 712ddd11f3f0..f3047c685706 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -861,12 +861,12 @@ static int __buffer_migrate_folio(struct address_space *mapping, } bh = bh->b_this_page; } while (bh != head); + spin_unlock(&mapping->i_private_lock); if (busy) { if (invalidated) { rc = -EAGAIN; goto unlock_buffers; } - spin_unlock(&mapping->i_private_lock); invalidate_bh_lrus(); invalidated = true; goto recheck_buffers; @@ -884,8 +884,6 @@ static int __buffer_migrate_folio(struct address_space *mapping, } while (bh != head); unlock_buffers: - if (check_refs) - spin_unlock(&mapping->i_private_lock); bh = head; do { unlock_buffer(bh); -- 2.47.2 ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [linux-next:master] [block/bdev] 3c20917120: BUG:sleeping_function_called_from_invalid_context_at_mm/util.c 2025-03-29 0:08 ` Luis Chamberlain 2025-03-29 1:06 ` Luis Chamberlain @ 2025-03-31 7:45 ` Sebastian Andrzej Siewior 2025-04-08 16:43 ` Darrick J. Wong 1 sibling, 1 reply; 31+ messages in thread From: Sebastian Andrzej Siewior @ 2025-03-31 7:45 UTC (permalink / raw) To: Luis Chamberlain Cc: Jan Kara, Kefeng Wang, David Bueso, Tso Ted, Ritesh Harjani, Johannes Weiner, Oliver Sang, Matthew Wilcox, David Hildenbrand, Alistair Popple, linux-mm, Christian Brauner, Hannes Reinecke, oe-lkp, lkp, John Garry, linux-block, ltp, Pankaj Raghav, Daniel Gomez, Dave Chinner, gost.dev On 2025-03-28 17:08:38 [-0700], Luis Chamberlain wrote: … > > Are there some preemption configs under which cond_resched() won't > > trigger a kernel splat where expected so the only thing I can think of > > is perhaps some preempt configs don't implicate a sleep? If true, > > instead of adding might_sleep() to one piece of code (in this case > > foio_mc_copy()) I wonder if instead just adding it to cond_resched() may > > be useful. > > I think the answer to the above is "no". I would say so. You need CONFIG_DEBUG_ATOMIC_SLEEP for the might-sleep magic to work. And then the splat from might_sleep() isn't different than the one from cond_resched(). > > Luis Sebastian ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [linux-next:master] [block/bdev] 3c20917120: BUG:sleeping_function_called_from_invalid_context_at_mm/util.c 2025-03-31 7:45 ` Sebastian Andrzej Siewior @ 2025-04-08 16:43 ` Darrick J. Wong 2025-04-08 17:06 ` Luis Chamberlain 0 siblings, 1 reply; 31+ messages in thread From: Darrick J. Wong @ 2025-04-08 16:43 UTC (permalink / raw) To: Luis Chamberlain Cc: Jan Kara, Kefeng Wang, David Bueso, Tso Ted, Ritesh Harjani, Johannes Weiner, Oliver Sang, Matthew Wilcox, David Hildenbrand, Alistair Popple, linux-mm, Christian Brauner, Hannes Reinecke, oe-lkp, lkp, John Garry, linux-block, ltp, Pankaj Raghav, Daniel Gomez, Dave Chinner, gost.dev, linux-fsdevel Hi Luis, I'm not sure if this is related, but I'm seeing the same "BUG: sleeping function called from invalid context at mm/util.c:743" message when running fstests on XFS. Nothing exciting with fstests here other than the machine is arm64 with 64k basepages and 4k fsblock size: MKFS_OPTIONS="-m metadir=1,autofsck=1,uquota,gquota,pquota" --D [18182.889554] run fstests generic/457 at 2025-04-07 23:06:25 [18182.973535] spectre-v4 mitigation disabled by command-line option [18184.849467] XFS (sda3): EXPERIMENTAL metadata directory tree feature enabled. Use at your own risk! [18184.852941] XFS (sda3): EXPERIMENTAL exchange range feature enabled. Use at your own risk! [18184.852962] XFS (sda3): EXPERIMENTAL parent pointer feature enabled. Use at your own risk! [18184.858065] XFS (sda3): Mounting V5 Filesystem 13d8c72d-ddac-4052-8d3c-a82c4ce0377d [18184.900002] XFS (sda3): Ending clean mount [18184.905990] XFS (sda3): Quotacheck needed: Please wait. [18184.919801] XFS (sda3): Quotacheck: Done. [18184.954170] XFS (sda3): Unmounting Filesystem 13d8c72d-ddac-4052-8d3c-a82c4ce0377d [18186.165572] XFS (dm-4): EXPERIMENTAL metadata directory tree feature enabled. Use at your own risk! [18186.165601] XFS (dm-4): EXPERIMENTAL exchange range feature enabled. Use at your own risk! [18186.165608] XFS (dm-4): EXPERIMENTAL parent pointer feature enabled. Use at your own risk! [18186.169589] XFS (dm-4): Mounting V5 Filesystem 6ade490d-15b0-43e5-9f17-db534769c746 [18187.121289] XFS (dm-4): Ending clean mount [18187.131797] XFS (dm-4): Quotacheck needed: Please wait. [18187.145700] XFS (dm-4): Quotacheck: Done. [18187.393486] XFS (dm-4): Unmounting Filesystem 6ade490d-15b0-43e5-9f17-db534769c746 [18190.592061] XFS (dm-3): EXPERIMENTAL metadata directory tree feature enabled. Use at your own risk! [18190.592083] XFS (dm-3): EXPERIMENTAL exchange range feature enabled. Use at your own risk! [18190.592089] XFS (dm-3): EXPERIMENTAL parent pointer feature enabled. Use at your own risk! [18190.601815] XFS (dm-3): Mounting V5 Filesystem 6ade490d-15b0-43e5-9f17-db534769c746 [18190.744215] XFS (dm-3): Starting recovery (logdev: internal) [18190.807553] XFS (dm-3): Ending recovery (logdev: internal) [18190.818708] XFS (dm-3): Unmounting Filesystem 6ade490d-15b0-43e5-9f17-db534769c746 [18193.786621] XFS (dm-3): EXPERIMENTAL metadata directory tree feature enabled. Use at your own risk! [18193.788879] XFS (dm-3): EXPERIMENTAL exchange range feature enabled. Use at your own risk! [18193.788882] XFS (dm-3): EXPERIMENTAL parent pointer feature enabled. Use at your own risk! [18193.790518] XFS (dm-3): Mounting V5 Filesystem 6ade490d-15b0-43e5-9f17-db534769c746 [18193.877969] XFS (dm-3): Starting recovery (logdev: internal) [18193.917688] XFS (dm-3): Ending recovery (logdev: internal) [18193.945675] XFS (dm-3): Unmounting Filesystem 6ade490d-15b0-43e5-9f17-db534769c746 [18196.985726] XFS (dm-3): EXPERIMENTAL metadata directory tree feature enabled. Use at your own risk! [18196.988868] XFS (dm-3): EXPERIMENTAL exchange range feature enabled. Use at your own risk! [18196.988873] XFS (dm-3): EXPERIMENTAL parent pointer feature enabled. Use at your own risk! [18196.998845] XFS (dm-3): Mounting V5 Filesystem 6ade490d-15b0-43e5-9f17-db534769c746 [18197.193740] XFS (dm-3): Starting recovery (logdev: internal) [18197.254119] XFS (dm-3): Ending recovery (logdev: internal) [18197.280596] XFS (dm-3): Unmounting Filesystem 6ade490d-15b0-43e5-9f17-db534769c746 [18200.173003] XFS (dm-3): EXPERIMENTAL metadata directory tree feature enabled. Use at your own risk! [18200.176855] XFS (dm-3): EXPERIMENTAL exchange range feature enabled. Use at your own risk! [18200.176859] XFS (dm-3): EXPERIMENTAL parent pointer feature enabled. Use at your own risk! [18200.185721] XFS (dm-3): Mounting V5 Filesystem 6ade490d-15b0-43e5-9f17-db534769c746 [18200.370893] XFS (dm-3): Starting recovery (logdev: internal) [18200.430454] XFS (dm-3): Ending recovery (logdev: internal) [18200.462036] XFS (dm-3): Unmounting Filesystem 6ade490d-15b0-43e5-9f17-db534769c746 [18203.311440] XFS (dm-3): EXPERIMENTAL metadata directory tree feature enabled. Use at your own risk! [18203.311454] XFS (dm-3): EXPERIMENTAL exchange range feature enabled. Use at your own risk! [18203.311464] XFS (dm-3): EXPERIMENTAL parent pointer feature enabled. Use at your own risk! [18203.324374] XFS (dm-3): Mounting V5 Filesystem 6ade490d-15b0-43e5-9f17-db534769c746 [18203.437989] XFS (dm-3): Starting recovery (logdev: internal) [18203.491993] XFS (dm-3): Ending recovery (logdev: internal) [18203.517090] XFS (dm-3): Unmounting Filesystem 6ade490d-15b0-43e5-9f17-db534769c746 [18206.442639] XFS (dm-3): EXPERIMENTAL metadata directory tree feature enabled. Use at your own risk! [18206.444851] XFS (dm-3): EXPERIMENTAL exchange range feature enabled. Use at your own risk! [18206.444854] XFS (dm-3): EXPERIMENTAL parent pointer feature enabled. Use at your own risk! [18206.455415] XFS (dm-3): Mounting V5 Filesystem 6ade490d-15b0-43e5-9f17-db534769c746 [18206.600488] XFS (dm-3): Starting recovery (logdev: internal) [18206.642538] XFS (dm-3): Ending recovery (logdev: internal) [18206.673822] XFS (dm-3): Unmounting Filesystem 6ade490d-15b0-43e5-9f17-db534769c746 [18209.666477] XFS (dm-3): EXPERIMENTAL metadata directory tree feature enabled. Use at your own risk! [18209.678778] XFS (dm-3): EXPERIMENTAL exchange range feature enabled. Use at your own risk! [18209.678782] XFS (dm-3): EXPERIMENTAL parent pointer feature enabled. Use at your own risk! [18209.690805] XFS (dm-3): Mounting V5 Filesystem 6ade490d-15b0-43e5-9f17-db534769c746 [18209.859688] XFS (dm-3): Starting recovery (logdev: internal) [18209.923426] XFS (dm-3): Ending recovery (logdev: internal) [18209.947181] XFS (dm-3): Unmounting Filesystem 6ade490d-15b0-43e5-9f17-db534769c746 [18212.920991] XFS (dm-3): EXPERIMENTAL metadata directory tree feature enabled. Use at your own risk! [18212.921001] XFS (dm-3): EXPERIMENTAL exchange range feature enabled. Use at your own risk! [18212.921012] XFS (dm-3): EXPERIMENTAL parent pointer feature enabled. Use at your own risk! [18212.925332] XFS (dm-3): Mounting V5 Filesystem 6ade490d-15b0-43e5-9f17-db534769c746 [18213.067578] XFS (dm-3): Starting recovery (logdev: internal) [18213.138633] XFS (dm-3): Ending recovery (logdev: internal) [18213.161827] XFS (dm-3): Unmounting Filesystem 6ade490d-15b0-43e5-9f17-db534769c746 [18216.154862] XFS (dm-3): EXPERIMENTAL metadata directory tree feature enabled. Use at your own risk! [18216.156952] XFS (dm-3): EXPERIMENTAL exchange range feature enabled. Use at your own risk! [18216.157070] XFS (dm-3): EXPERIMENTAL parent pointer feature enabled. Use at your own risk! [18216.161145] XFS (dm-3): Mounting V5 Filesystem 6ade490d-15b0-43e5-9f17-db534769c746 [18216.333087] XFS (dm-3): Starting recovery (logdev: internal) [18216.389192] XFS (dm-3): Ending recovery (logdev: internal) [18216.410647] XFS (dm-3): Unmounting Filesystem 6ade490d-15b0-43e5-9f17-db534769c746 [18217.949035] BUG: sleeping function called from invalid context at mm/util.c:743 [18217.949047] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 35, name: kcompactd0 [18217.949056] preempt_count: 1, expected: 0 [18217.949058] RCU nest depth: 0, expected: 0 [18217.949060] Preemption disabled at: [18217.949062] [<fffffe0080339c98>] __buffer_migrate_folio+0xb8/0x2d0 [18217.949070] CPU: 0 UID: 0 PID: 35 Comm: kcompactd0 Not tainted 6.15.0-rc1-acha #rc1 PREEMPT 92ec4d9d73adc951fe6bbe0d3f3b75d35d67fded [18217.949074] Hardware name: QEMU KVM Virtual Machine, BIOS 1.6.6 08/22/2023 [18217.949075] Call trace: [18217.949076] show_stack+0x20/0x38 (C) [18217.949080] dump_stack_lvl+0x78/0x90 [18217.949083] dump_stack+0x18/0x28 [18217.949084] __might_resched+0x164/0x1d0 [18217.949086] folio_mc_copy+0x5c/0xa0 [18217.949089] __migrate_folio.constprop.0+0x70/0x1c8 [18217.949092] __buffer_migrate_folio+0x2bc/0x2d0 [18217.949094] buffer_migrate_folio_norefs+0x1c/0x30 [18217.949096] move_to_new_folio+0x70/0x1f0 [18217.949099] migrate_pages_batch+0x9c4/0xf20 [18217.949101] migrate_pages+0xb74/0xde8 [18217.949103] compact_zone+0x9ac/0xff0 [18217.949105] compact_node+0x9c/0x1a0 [18217.949107] kcompactd+0x38c/0x400 [18217.949108] kthread+0x144/0x210 [18217.949110] ret_from_fork+0x10/0x20 ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [linux-next:master] [block/bdev] 3c20917120: BUG:sleeping_function_called_from_invalid_context_at_mm/util.c 2025-04-08 16:43 ` Darrick J. Wong @ 2025-04-08 17:06 ` Luis Chamberlain 2025-04-08 17:24 ` Luis Chamberlain 0 siblings, 1 reply; 31+ messages in thread From: Luis Chamberlain @ 2025-04-08 17:06 UTC (permalink / raw) To: Darrick J. Wong, David Bueso Cc: Jan Kara, Kefeng Wang, David Bueso, Tso Ted, Ritesh Harjani, Johannes Weiner, Oliver Sang, Matthew Wilcox, David Hildenbrand, Alistair Popple, linux-mm, Christian Brauner, Hannes Reinecke, oe-lkp, lkp, John Garry, linux-block, ltp, Pankaj Raghav, Daniel Gomez, Dave Chinner, gost.dev, linux-fsdevel On Tue, Apr 08, 2025 at 09:43:07AM -0700, Darrick J. Wong wrote: > Hi Luis, > > I'm not sure if this is related, but I'm seeing the same "BUG: sleeping > function called from invalid context at mm/util.c:743" message when > running fstests on XFS. Nothing exciting with fstests here other than > the machine is arm64 with 64k basepages and 4k fsblock size: How exotic :D > MKFS_OPTIONS="-m metadir=1,autofsck=1,uquota,gquota,pquota" > > --D > > [18182.889554] run fstests generic/457 at 2025-04-07 23:06:25 Me and Davidlohr have some fixes brewed up now, before we post we just want to run one more test for metrics on success rate analysis for folio migration. Other than that, given the exotic nature of your system we'll Cc you on preliminary patches, in case you can test to see if it also fixes your issue. It should given your splat is on the buffer-head side of things! See _buffer_migrate_folio() reference on the splat. Fun puzzle for the community is figuring out *why* oh why did a large folio end up being used on buffer-heads for your use case *without* an LBS device (logical block size) being present, as I assume you didn't have one, ie say a nvme or virtio block device with logical block size > PAGE_SIZE. The area in question would trigger on folio migration *only* if you are migrating large buffer-head folios. We only create those if you have an LBS device and are leveragin the block device cache or a filesystem with buffer-heads with LBS (they don't exist yet other than the block device cache). Regardless, the patches we have brewed up should fix this, regardless of the puzzle described above. We'll cc you for testing before we post patches to address this. Luis ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [linux-next:master] [block/bdev] 3c20917120: BUG:sleeping_function_called_from_invalid_context_at_mm/util.c 2025-04-08 17:06 ` Luis Chamberlain @ 2025-04-08 17:24 ` Luis Chamberlain 2025-04-08 17:48 ` Darrick J. Wong 0 siblings, 1 reply; 31+ messages in thread From: Luis Chamberlain @ 2025-04-08 17:24 UTC (permalink / raw) To: Darrick J. Wong, David Bueso Cc: Jan Kara, Kefeng Wang, Tso Ted, Ritesh Harjani, Johannes Weiner, Oliver Sang, Matthew Wilcox, David Hildenbrand, Alistair Popple, linux-mm, Christian Brauner, Hannes Reinecke, oe-lkp, lkp, John Garry, linux-block, ltp, Pankaj Raghav, Daniel Gomez, Dave Chinner, gost.dev, linux-fsdevel On Tue, Apr 8, 2025 at 10:06 AM Luis Chamberlain <mcgrof@kernel.org> wrote: > Fun > puzzle for the community is figuring out *why* oh why did a large folio > end up being used on buffer-heads for your use case *without* an LBS > device (logical block size) being present, as I assume you didn't have > one, ie say a nvme or virtio block device with logical block size > > PAGE_SIZE. The area in question would trigger on folio migration *only* > if you are migrating large buffer-head folios. We only create those To be clear, large folios for buffer-heads. > if > you have an LBS device and are leveraging the block device cache or a > filesystem with buffer-heads with LBS (they don't exist yet other than > the block device cache). Luis ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [linux-next:master] [block/bdev] 3c20917120: BUG:sleeping_function_called_from_invalid_context_at_mm/util.c 2025-04-08 17:24 ` Luis Chamberlain @ 2025-04-08 17:48 ` Darrick J. Wong 2025-04-08 17:51 ` Matthew Wilcox 2025-04-08 18:06 ` Luis Chamberlain 0 siblings, 2 replies; 31+ messages in thread From: Darrick J. Wong @ 2025-04-08 17:48 UTC (permalink / raw) To: Luis Chamberlain Cc: David Bueso, Jan Kara, Kefeng Wang, Tso Ted, Ritesh Harjani, Johannes Weiner, Oliver Sang, Matthew Wilcox, David Hildenbrand, Alistair Popple, linux-mm, Christian Brauner, Hannes Reinecke, oe-lkp, lkp, John Garry, linux-block, ltp, Pankaj Raghav, Daniel Gomez, Dave Chinner, gost.dev, linux-fsdevel On Tue, Apr 08, 2025 at 10:24:40AM -0700, Luis Chamberlain wrote: > On Tue, Apr 8, 2025 at 10:06 AM Luis Chamberlain <mcgrof@kernel.org> wrote: > > Fun > > puzzle for the community is figuring out *why* oh why did a large folio > > end up being used on buffer-heads for your use case *without* an LBS > > device (logical block size) being present, as I assume you didn't have > > one, ie say a nvme or virtio block device with logical block size > > > PAGE_SIZE. The area in question would trigger on folio migration *only* > > if you are migrating large buffer-head folios. We only create those > > To be clear, large folios for buffer-heads. > > if > > you have an LBS device and are leveraging the block device cache or a > > filesystem with buffer-heads with LBS (they don't exist yet other than > > the block device cache). My guess is that udev or something tries to read the disk label in response to some uevent (mkfs, mount, unmount, etc), which creates a large folio because min_order > 0, and attaches a buffer head. There's a separate crash report that I'll cc you on. --D ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [linux-next:master] [block/bdev] 3c20917120: BUG:sleeping_function_called_from_invalid_context_at_mm/util.c 2025-04-08 17:48 ` Darrick J. Wong @ 2025-04-08 17:51 ` Matthew Wilcox 2025-04-08 18:02 ` Darrick J. Wong 2025-04-08 18:06 ` Luis Chamberlain 1 sibling, 1 reply; 31+ messages in thread From: Matthew Wilcox @ 2025-04-08 17:51 UTC (permalink / raw) To: Darrick J. Wong Cc: Luis Chamberlain, David Bueso, Jan Kara, Kefeng Wang, Tso Ted, Ritesh Harjani, Johannes Weiner, Oliver Sang, David Hildenbrand, Alistair Popple, linux-mm, Christian Brauner, Hannes Reinecke, oe-lkp, lkp, John Garry, linux-block, ltp, Pankaj Raghav, Daniel Gomez, Dave Chinner, gost.dev, linux-fsdevel On Tue, Apr 08, 2025 at 10:48:55AM -0700, Darrick J. Wong wrote: > On Tue, Apr 08, 2025 at 10:24:40AM -0700, Luis Chamberlain wrote: > > On Tue, Apr 8, 2025 at 10:06 AM Luis Chamberlain <mcgrof@kernel.org> wrote: > > > Fun > > > puzzle for the community is figuring out *why* oh why did a large folio > > > end up being used on buffer-heads for your use case *without* an LBS > > > device (logical block size) being present, as I assume you didn't have > > > one, ie say a nvme or virtio block device with logical block size > > > > PAGE_SIZE. The area in question would trigger on folio migration *only* > > > if you are migrating large buffer-head folios. We only create those > > > > To be clear, large folios for buffer-heads. > > > if > > > you have an LBS device and are leveraging the block device cache or a > > > filesystem with buffer-heads with LBS (they don't exist yet other than > > > the block device cache). > > My guess is that udev or something tries to read the disk label in > response to some uevent (mkfs, mount, unmount, etc), which creates a > large folio because min_order > 0, and attaches a buffer head. There's > a separate crash report that I'll cc you on. But you said: > the machine is arm64 with 64k basepages and 4k fsblock size: so that shouldn't be using large folios because you should have set the order to 0. Right? Or did you mis-speak and use a 4K PAGE_SIZE kernel with a 64k fsblocksize? ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [linux-next:master] [block/bdev] 3c20917120: BUG:sleeping_function_called_from_invalid_context_at_mm/util.c 2025-04-08 17:51 ` Matthew Wilcox @ 2025-04-08 18:02 ` Darrick J. Wong 2025-04-08 18:51 ` Matthew Wilcox 0 siblings, 1 reply; 31+ messages in thread From: Darrick J. Wong @ 2025-04-08 18:02 UTC (permalink / raw) To: Matthew Wilcox Cc: Luis Chamberlain, David Bueso, Jan Kara, Kefeng Wang, Tso Ted, Ritesh Harjani, Johannes Weiner, Oliver Sang, David Hildenbrand, Alistair Popple, linux-mm, Christian Brauner, Hannes Reinecke, oe-lkp, lkp, John Garry, linux-block, ltp, Pankaj Raghav, Daniel Gomez, Dave Chinner, gost.dev, linux-fsdevel On Tue, Apr 08, 2025 at 06:51:14PM +0100, Matthew Wilcox wrote: > On Tue, Apr 08, 2025 at 10:48:55AM -0700, Darrick J. Wong wrote: > > On Tue, Apr 08, 2025 at 10:24:40AM -0700, Luis Chamberlain wrote: > > > On Tue, Apr 8, 2025 at 10:06 AM Luis Chamberlain <mcgrof@kernel.org> wrote: > > > > Fun > > > > puzzle for the community is figuring out *why* oh why did a large folio > > > > end up being used on buffer-heads for your use case *without* an LBS > > > > device (logical block size) being present, as I assume you didn't have > > > > one, ie say a nvme or virtio block device with logical block size > > > > > PAGE_SIZE. The area in question would trigger on folio migration *only* > > > > if you are migrating large buffer-head folios. We only create those > > > > > > To be clear, large folios for buffer-heads. > > > > if > > > > you have an LBS device and are leveraging the block device cache or a > > > > filesystem with buffer-heads with LBS (they don't exist yet other than > > > > the block device cache). > > > > My guess is that udev or something tries to read the disk label in > > response to some uevent (mkfs, mount, unmount, etc), which creates a > > large folio because min_order > 0, and attaches a buffer head. There's > > a separate crash report that I'll cc you on. > > But you said: > > > the machine is arm64 with 64k basepages and 4k fsblock size: > > so that shouldn't be using large folios because you should have set the > order to 0. Right? Or did you mis-speak and use a 4K PAGE_SIZE kernel > with a 64k fsblocksize? This particular kernel warning is arm64 with 64k base pages and a 4k fsblock size, and my suspicion is that udev/libblkid are creating the buffer heads or something weird like that. On x64 with 4k base pages, xfs/032 creates a filesystem with 64k sector size and there's an actual kernel crash resulting from a udev worker: https://lore.kernel.org/linux-fsdevel/20250408175125.GL6266@frogsfrogsfrogs/T/#u So I didn't misspeak, I just have two problems. I actually have four problems, but the others are loop device behavior changes. --D ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [linux-next:master] [block/bdev] 3c20917120: BUG:sleeping_function_called_from_invalid_context_at_mm/util.c 2025-04-08 18:02 ` Darrick J. Wong @ 2025-04-08 18:51 ` Matthew Wilcox 2025-04-08 19:13 ` Luis Chamberlain 2025-04-08 19:13 ` Luis Chamberlain 0 siblings, 2 replies; 31+ messages in thread From: Matthew Wilcox @ 2025-04-08 18:51 UTC (permalink / raw) To: Darrick J. Wong Cc: Luis Chamberlain, David Bueso, Jan Kara, Kefeng Wang, Tso Ted, Ritesh Harjani, Johannes Weiner, Oliver Sang, David Hildenbrand, Alistair Popple, linux-mm, Christian Brauner, Hannes Reinecke, oe-lkp, lkp, John Garry, linux-block, ltp, Pankaj Raghav, Daniel Gomez, Dave Chinner, gost.dev, linux-fsdevel On Tue, Apr 08, 2025 at 11:02:40AM -0700, Darrick J. Wong wrote: > On Tue, Apr 08, 2025 at 06:51:14PM +0100, Matthew Wilcox wrote: > > On Tue, Apr 08, 2025 at 10:48:55AM -0700, Darrick J. Wong wrote: > > > On Tue, Apr 08, 2025 at 10:24:40AM -0700, Luis Chamberlain wrote: > > > > On Tue, Apr 8, 2025 at 10:06 AM Luis Chamberlain <mcgrof@kernel.org> wrote: > > > > > Fun > > > > > puzzle for the community is figuring out *why* oh why did a large folio > > > > > end up being used on buffer-heads for your use case *without* an LBS > > > > > device (logical block size) being present, as I assume you didn't have > > > > > one, ie say a nvme or virtio block device with logical block size > > > > > > PAGE_SIZE. The area in question would trigger on folio migration *only* > > > > > if you are migrating large buffer-head folios. We only create those > > > > > > > > To be clear, large folios for buffer-heads. > > > > > if > > > > > you have an LBS device and are leveraging the block device cache or a > > > > > filesystem with buffer-heads with LBS (they don't exist yet other than > > > > > the block device cache). > > > > > > My guess is that udev or something tries to read the disk label in > > > response to some uevent (mkfs, mount, unmount, etc), which creates a > > > large folio because min_order > 0, and attaches a buffer head. There's > > > a separate crash report that I'll cc you on. > > > > But you said: > > > > > the machine is arm64 with 64k basepages and 4k fsblock size: > > > > so that shouldn't be using large folios because you should have set the > > order to 0. Right? Or did you mis-speak and use a 4K PAGE_SIZE kernel > > with a 64k fsblocksize? > > This particular kernel warning is arm64 with 64k base pages and a 4k > fsblock size, and my suspicion is that udev/libblkid are creating the > buffer heads or something weird like that. > > On x64 with 4k base pages, xfs/032 creates a filesystem with 64k sector > size and there's an actual kernel crash resulting from a udev worker: > https://lore.kernel.org/linux-fsdevel/20250408175125.GL6266@frogsfrogsfrogs/T/#u > > So I didn't misspeak, I just have two problems. I actually have four > problems, but the others are loop device behavior changes. Right, but this warning only triggers for large folios. So somehow we've got a multi-page folio in the bdev's page cache. Ah. I see. block/bdev.c: mapping_set_folio_min_order(BD_INODE(bdev)->i_mapping, so we're telling the bdev that it can go up to MAX_PAGECACHE_ORDER. And then we call readahead, which will happily put order-2 folios in the pagecache because of my bug that we've never bothered fixing. We should probably fix that now, but as a temporary measure if you'd like to put: mapping_set_folio_order_range(BD_INODE(bdev)->i_mapping, min, min) instead of the mapping_set_folio_min_order(), that would make the bug no longer appear for you. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [linux-next:master] [block/bdev] 3c20917120: BUG:sleeping_function_called_from_invalid_context_at_mm/util.c 2025-04-08 18:51 ` Matthew Wilcox @ 2025-04-08 19:13 ` Luis Chamberlain 2025-04-08 19:13 ` Luis Chamberlain 1 sibling, 0 replies; 31+ messages in thread From: Luis Chamberlain @ 2025-04-08 19:13 UTC (permalink / raw) To: Matthew Wilcox Cc: Darrick J. Wong, David Bueso, Jan Kara, Kefeng Wang, Tso Ted, Ritesh Harjani, Johannes Weiner, Oliver Sang, David Hildenbrand, Alistair Popple, linux-mm, Christian Brauner, Hannes Reinecke, oe-lkp, lkp, John Garry, linux-block, ltp, Pankaj Raghav, Daniel Gomez, Dave Chinner, gost.dev, linux-fsdevel On Tue, Apr 08, 2025 at 07:51:03PM +0100, Matthew Wilcox wrote: > On Tue, Apr 08, 2025 at 11:02:40AM -0700, Darrick J. Wong wrote: > > On Tue, Apr 08, 2025 at 06:51:14PM +0100, Matthew Wilcox wrote: > > > On Tue, Apr 08, 2025 at 10:48:55AM -0700, Darrick J. Wong wrote: > > > > On Tue, Apr 08, 2025 at 10:24:40AM -0700, Luis Chamberlain wrote: > > > > > On Tue, Apr 8, 2025 at 10:06 AM Luis Chamberlain <mcgrof@kernel.org> wrote: > > > > > > Fun > > > > > > puzzle for the community is figuring out *why* oh why did a large folio > > > > > > end up being used on buffer-heads for your use case *without* an LBS > > > > > > device (logical block size) being present, as I assume you didn't have > > > > > > one, ie say a nvme or virtio block device with logical block size > > > > > > > PAGE_SIZE. The area in question would trigger on folio migration *only* > > > > > > if you are migrating large buffer-head folios. We only create those > > > > > > > > > > To be clear, large folios for buffer-heads. > > > > > > if > > > > > > you have an LBS device and are leveraging the block device cache or a > > > > > > filesystem with buffer-heads with LBS (they don't exist yet other than > > > > > > the block device cache). > > > > > > > > My guess is that udev or something tries to read the disk label in > > > > response to some uevent (mkfs, mount, unmount, etc), which creates a > > > > large folio because min_order > 0, and attaches a buffer head. There's > > > > a separate crash report that I'll cc you on. > > > > > > But you said: > > > > > > > the machine is arm64 with 64k basepages and 4k fsblock size: > > > > > > so that shouldn't be using large folios because you should have set the > > > order to 0. Right? Or did you mis-speak and use a 4K PAGE_SIZE kernel > > > with a 64k fsblocksize? > > > > This particular kernel warning is arm64 with 64k base pages and a 4k > > fsblock size, and my suspicion is that udev/libblkid are creating the > > buffer heads or something weird like that. > > > > On x64 with 4k base pages, xfs/032 creates a filesystem with 64k sector > > size and there's an actual kernel crash resulting from a udev worker: > > https://lore.kernel.org/linux-fsdevel/20250408175125.GL6266@frogsfrogsfrogs/T/#u > > > > So I didn't misspeak, I just have two problems. I actually have four > > problems, but the others are loop device behavior changes. > > Right, but this warning only triggers for large folios. So somehow > we've got a multi-page folio in the bdev's page cache. > > Ah. I see. > > block/bdev.c: mapping_set_folio_min_order(BD_INODE(bdev)->i_mapping, > > so we're telling the bdev that it can go up to MAX_PAGECACHE_ORDER. Ah yes silly me that would explain the large folios without LBS devices. > And then we call readahead, which will happily put order-2 folios > in the pagecache because of my bug that we've never bothered fixing. > > We should probably fix that now, but as a temporary measure if > you'd like to put: > > mapping_set_folio_order_range(BD_INODE(bdev)->i_mapping, min, min) > > instead of the mapping_set_folio_min_order(), that would make the bug > no longer appear for you. Agreed. Luis ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [linux-next:master] [block/bdev] 3c20917120: BUG:sleeping_function_called_from_invalid_context_at_mm/util.c 2025-04-08 18:51 ` Matthew Wilcox 2025-04-08 19:13 ` Luis Chamberlain @ 2025-04-08 19:13 ` Luis Chamberlain 1 sibling, 0 replies; 31+ messages in thread From: Luis Chamberlain @ 2025-04-08 19:13 UTC (permalink / raw) To: Matthew Wilcox Cc: Darrick J. Wong, David Bueso, Jan Kara, Kefeng Wang, Tso Ted, Ritesh Harjani, Johannes Weiner, Oliver Sang, David Hildenbrand, Alistair Popple, linux-mm, Christian Brauner, Hannes Reinecke, oe-lkp, lkp, John Garry, linux-block, ltp, Pankaj Raghav, Daniel Gomez, Dave Chinner, gost.dev, linux-fsdevel On Tue, Apr 08, 2025 at 07:51:03PM +0100, Matthew Wilcox wrote: > And then we call readahead, which will happily put order-2 folios > in the pagecache because of my bug that we've never bothered fixing. What was that BTW? Luis ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [linux-next:master] [block/bdev] 3c20917120: BUG:sleeping_function_called_from_invalid_context_at_mm/util.c 2025-04-08 17:48 ` Darrick J. Wong 2025-04-08 17:51 ` Matthew Wilcox @ 2025-04-08 18:06 ` Luis Chamberlain 1 sibling, 0 replies; 31+ messages in thread From: Luis Chamberlain @ 2025-04-08 18:06 UTC (permalink / raw) To: Darrick J. Wong Cc: David Bueso, Jan Kara, Kefeng Wang, Tso Ted, Ritesh Harjani, Johannes Weiner, Oliver Sang, Matthew Wilcox, David Hildenbrand, Alistair Popple, linux-mm, Christian Brauner, Hannes Reinecke, oe-lkp, lkp, John Garry, linux-block, ltp, Pankaj Raghav, Daniel Gomez, Dave Chinner, gost.dev, linux-fsdevel On Tue, Apr 08, 2025 at 10:48:55AM -0700, Darrick J. Wong wrote: > On Tue, Apr 08, 2025 at 10:24:40AM -0700, Luis Chamberlain wrote: > > On Tue, Apr 8, 2025 at 10:06 AM Luis Chamberlain <mcgrof@kernel.org> wrote: > > > Fun > > > puzzle for the community is figuring out *why* oh why did a large folio > > > end up being used on buffer-heads for your use case *without* an LBS > > > device (logical block size) being present, as I assume you didn't have > > > one, ie say a nvme or virtio block device with logical block size > > > > PAGE_SIZE. The area in question would trigger on folio migration *only* > > > if you are migrating large buffer-head folios. We only create those > > > > To be clear, large folios for buffer-heads. > > > if > > > you have an LBS device and are leveraging the block device cache or a > > > filesystem with buffer-heads with LBS (they don't exist yet other than > > > the block device cache). > > My guess is that udev or something tries to read the disk label in > response to some uevent (mkfs, mount, unmount, etc), which creates a > large folio because min_order > 0, and attaches a buffer head. There's > a separate crash report that I'll cc you on. OK so as willy pointed out I buy that for x86_64 *iff* we do already have opportunistic large folio support for the buffer-head read/write path. But also, I don't think we enable large folios yet on the block device cache aops unless we have a min order block device... so what gives? Luis ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [linux-next:master] [block/bdev] 3c20917120: BUG:sleeping_function_called_from_invalid_context_at_mm/util.c 2025-03-18 8:15 ` [linux-next:master] [block/bdev] 3c20917120: BUG:sleeping_function_called_from_invalid_context_at_mm/util.c Luis Chamberlain 2025-03-18 14:37 ` Matthew Wilcox @ 2025-03-20 1:24 ` Lai, Yi 1 sibling, 0 replies; 31+ messages in thread From: Lai, Yi @ 2025-03-20 1:24 UTC (permalink / raw) To: Luis Chamberlain Cc: Oliver Sang, David Hildenbrand, Alistair Popple, linux-mm, Christian Brauner, Hannes Reinecke, oe-lkp, lkp, Matthew Wilcox (Oracle), John Garry, linux-block, ltp, Pankaj Raghav, Daniel Gomez, yi1.lai On Tue, Mar 18, 2025 at 01:15:33AM -0700, Luis Chamberlain wrote: > On Tue, Mar 18, 2025 at 01:28:20PM +0800, Oliver Sang wrote: > > hi, Christian Brauner, > > > > On Tue, Mar 11, 2025 at 01:10:43PM +0100, Christian Brauner wrote: > > > On Mon, Mar 10, 2025 at 03:43:49PM +0800, kernel test robot wrote: > > > > > > > > > > > > Hello, > > > > > > > > kernel test robot noticed "BUG:sleeping_function_called_from_invalid_context_at_mm/util.c" on: > > > > > > > > commit: 3c20917120ce61f2a123ca0810293872f4c6b5a4 ("block/bdev: enable large folio support for large logical block sizes") > > > > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master > > > > > > Is this also already fixed by: > > > > > > commit a64e5a596067 ("bdev: add back PAGE_SIZE block size validation for sb_set_blocksize()") > > > > > > ? > > > > sorry for late. > > > > commit a64e5a596067 cannot fix the issue. one dmesg is attached FYI. > > > > we also tried to check linux-next/master tip, but neither below one can boot > > successfully in our env which we need further check. > > > > da920b7df70177 (tag: next-20250314, linux-next/master) Add linux-next specific files for 20250314 > > > > e94bd4ec45ac1 (tag: next-20250317, linux-next/master) Add linux-next specific files for 20250317 > > > > so we are not sure the status of latest linux-next/master. > > > > if you want us to check other commit or other patches, please let us know. thanks! > > I cannot reproduce the issue by running the LTP test manually in a loop > for a long time: > > export LTP_RUNTIME_MUL=2 > > while true; do \ > ./testcases/kernel/syscalls/close_range/close_range01; done > > What's the failure rate of just running the test alone above? > Does it always fail on this system? Is this a deterministic failure > or does it have a lower failure rate? > Hi Luis, Greetings! I used Syzkaller and found that this issue can also be reproduced using Syzkaller reproduction binary. All detailed into can be found at: https://github.com/laifryiee/syzkaller_logs/tree/main/250320_033346_folio_mc_copy Syzkaller repro code: https://github.com/laifryiee/syzkaller_logs/tree/main/250320_033346_folio_mc_copy/repro.c Syzkaller repro syscall steps: https://github.com/laifryiee/syzkaller_logs/tree/main/250320_033346_folio_mc_copy/repro.prog Syzkaller report: https://github.com/laifryiee/syzkaller_logs/tree/main/250320_033346_folio_mc_copy/repro.report Kconfig(make olddefconfig): https://github.com/laifryiee/syzkaller_logs/tree/main/250320_033346_folio_mc_copy/kconfig_origin Bisect info: https://github.com/laifryiee/syzkaller_logs/tree/main/250320_033346_folio_mc_copy/bisect_info.log bzImage: https://github.com/laifryiee/syzkaller_logs/raw/refs/heads/main/250320_033346_folio_mc_copy/bzImage_e94bd4ec45ac156616da285a0bf03056cd7430fc Issue dmesg: https://github.com/laifryiee/syzkaller_logs/blob/main/250320_033346_folio_mc_copy/e94bd4ec45ac156616da285a0bf03056cd7430fc_dmesg.log After bisection and the first bad commit is: " 3c20917120ce block/bdev: enable large folio support for large logical block sizes " " [ 23.399326] dump_stack+0x19/0x20 [ 23.399332] __might_resched+0x37b/0x5a0 [ 23.399345] ? __kasan_check_read+0x15/0x20 [ 23.399354] folio_mc_copy+0x111/0x240 [ 23.399368] __migrate_folio.constprop.0+0x173/0x3c0 [ 23.399377] __buffer_migrate_folio+0x6a2/0x7b0 [ 23.399389] buffer_migrate_folio_norefs+0x3d/0x50 [ 23.399398] move_to_new_folio+0x153/0x5b0 [ 23.399403] ? __pfx_buffer_migrate_folio_norefs+0x10/0x10 [ 23.399412] migrate_pages_batch+0x19e0/0x2890 [ 23.399424] ? __pfx_compaction_free+0x10/0x10 [ 23.399444] ? __pfx_migrate_pages_batch+0x10/0x10 [ 23.399450] ? __kasan_check_read+0x15/0x20 [ 23.399455] ? __lock_acquire+0xdb6/0x5d60 [ 23.399475] ? __pfx___lock_acquire+0x10/0x10 [ 23.399486] migrate_pages+0x18de/0x2450 [ 23.399500] ? __pfx_compaction_free+0x10/0x10 [ 23.399505] ? __pfx_compaction_alloc+0x10/0x10 [ 23.399514] ? __pfx_migrate_pages+0x10/0x10 [ 23.399519] ? __this_cpu_preempt_check+0x21/0x30 [ 23.399533] ? rcu_is_watching+0x19/0xc0 [ 23.399546] ? isolate_migratepages_block+0x2253/0x41c0 [ 23.399565] ? __pfx_isolate_migratepages_block+0x10/0x10 [ 23.399578] compact_zone+0x1d66/0x4480 [ 23.399600] ? perf_trace_lock+0xe0/0x4f0 [ 23.399612] ? __pfx_compact_zone+0x10/0x10 [ 23.399617] ? __pfx_perf_trace_lock+0x10/0x10 [ 23.399627] ? __pfx_lock_acquire+0x10/0x10 [ 23.399639] compact_node+0x190/0x2c0 [ 23.399647] ? __pfx_compact_node+0x10/0x10 [ 23.399653] ? __pfx_lock_release+0x10/0x10 [ 23.399678] ? _raw_spin_unlock_irqrestore+0x45/0x70 [ 23.399694] kcompactd+0x784/0xde0 [ 23.399705] ? __pfx_kcompactd+0x10/0x10 [ 23.399711] ? lockdep_hardirqs_on+0x89/0x110 [ 23.399721] ? __pfx_autoremove_wake_function+0x10/0x10 [ 23.399731] ? __sanitizer_cov_trace_const_cmp1+0x1e/0x30 [ 23.399742] ? __kthread_parkme+0x15d/0x230 [ 23.399753] ? __pfx_kcompactd+0x10/0x10 [ 23.399761] kthread+0x444/0x980 [ 23.399769] ? __pfx_kthread+0x10/0x10 [ 23.399776] ? _raw_spin_unlock_irq+0x3c/0x60 [ 23.399784] ? __pfx_kthread+0x10/0x10 [ 23.399792] ret_from_fork+0x56/0x90 [ 23.399802] ? __pfx_kthread+0x10/0x10 [ 23.399809] ret_from_fork_asm+0x1a/0x30 [ 23.399827] </TASK> " Hope this cound be insightful to you. Regards, Yi Lai --- If you don't need the following environment to reproduce the problem or if you already have one reproduced environment, please ignore the following information. How to reproduce: git clone https://gitlab.com/xupengfe/repro_vm_env.git cd repro_vm_env tar -xvf repro_vm_env.tar.gz cd repro_vm_env; ./start3.sh // it needs qemu-system-x86_64 and I used v7.1.0 // start3.sh will load bzImage_2241ab53cbb5cdb08a6b2d4688feb13971058f65 v6.2-rc5 kernel // You could change the bzImage_xxx as you want // Maybe you need to remove line "-drive if=pflash,format=raw,readonly=on,file=./OVMF_CODE.fd \" for different qemu version You could use below command to log in, there is no password for root. ssh -p 10023 root@localhost After login vm(virtual machine) successfully, you could transfer reproduced binary to the vm by below way, and reproduce the problem in vm: gcc -pthread -o repro repro.c scp -P 10023 repro root@localhost:/root/ Get the bzImage for target kernel: Please use target kconfig and copy it to kernel_src/.config make olddefconfig make -jx bzImage //x should equal or less than cpu num your pc has Fill the bzImage file into above start3.sh to load the target kernel in vm. Tips: If you already have qemu-system-x86_64, please ignore below info. If you want to install qemu v7.1.0 version: git clone https://github.com/qemu/qemu.git cd qemu git checkout -f v7.1.0 mkdir build cd build yum install -y ninja-build.x86_64 yum -y install libslirp-devel.x86_64 ../configure --target-list=x86_64-softmmu --enable-kvm --enable-vnc --enable-gtk --enable-sdl --enable-usb-redir --enable-slirp make make install > I also can't see how the patch ("("block/bdev: enable large folio > support for large logical block sizes") would trigger this. > > You could try this patch but ... > > https://lore.kernel.org/all/20250312050028.1784117-1-mcgrof@kernel.org/ > > we decided this is not right and not needed, and if we have a buggy > block driver we can address that. > > I just can't see how this LTP test actually doing anything funky with block > devices at all. > > The associated sleeping while atomic warning is triggered during > compaction though: > > [ 218.143642][ T299] Architecture: x86_64 > [ 218.143659][ T299] > [ 218.427851][ T51] BUG: sleeping function called from invalid context at mm/util.c:901 > [ 218.435981][ T51] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 51, name: kcompactd0 > [ 218.444773][ T51] preempt_count: 1, expected: 0 > [ 218.449601][ T51] RCU nest depth: 0, expected: 0 > [ 218.454476][ T51] CPU: 2 UID: 0 PID: 51 Comm: kcompactd0 Tainted: G S 6.14.0-rc1-00006-g3c20917120ce #1 > [ 218.454486][ T51] Tainted: [S]=CPU_OUT_OF_SPEC > [ 218.454488][ T51] Hardware name: Hewlett-Packard HP Pro 3340 MT/17A1, BIOS 8.07 01/24/2013 > [ 218.454492][ T51] Call Trace: > [ 218.454495][ T51] <TASK> > [ 218.454498][ T51] dump_stack_lvl+0x4f/0x70 > [ 218.454508][ T51] __might_resched+0x2c6/0x450 > [ 218.454517][ T51] folio_mc_copy+0xca/0x1f0 > [ 218.454525][ T51] ? _raw_spin_lock+0x81/0xe0 > [ 218.454532][ T51] __migrate_folio+0x11a/0x2d0 > [ 218.454541][ T51] __buffer_migrate_folio+0x558/0x660 > [ 218.454548][ T51] move_to_new_folio+0xf5/0x410 > [ 218.454555][ T51] migrate_folio_move+0x211/0x770 > [ 218.454562][ T51] ? __pfx_compaction_free+0x10/0x10 > [ 218.454572][ T51] ? __pfx_migrate_folio_move+0x10/0x10 > [ 218.454578][ T51] ? compaction_alloc_noprof+0x441/0x720 > [ 218.454587][ T51] ? __pfx_compaction_alloc+0x10/0x10 > [ 218.454594][ T51] ? __pfx_compaction_free+0x10/0x10 > [ 218.454601][ T51] ? __pfx_compaction_free+0x10/0x10 > [ 218.454607][ T51] ? migrate_folio_unmap+0x329/0x890 > [ 218.454614][ T51] migrate_pages_batch+0xddf/0x1810 > [ 218.454621][ T51] ? __pfx_compaction_free+0x10/0x10 > [ 218.454631][ T51] ? __pfx_migrate_pages_batch+0x10/0x10 > [ 218.454638][ T51] ? cgroup_rstat_updated+0xf1/0x860 > [ 218.454648][ T51] migrate_pages_sync+0x10c/0x8e0 > [ 218.454656][ T51] ? __pfx_compaction_alloc+0x10/0x10 > [ 218.454662][ T51] ? __pfx_compaction_free+0x10/0x10 > [ 218.454669][ T51] ? lru_gen_del_folio+0x383/0x820 > [ 218.454677][ T51] ? __pfx_migrate_pages_sync+0x10/0x10 > [ 218.454683][ T51] ? set_pfnblock_flags_mask+0x179/0x220 > [ 218.454691][ T51] ? __pfx_lru_gen_del_folio+0x10/0x10 > [ 218.454699][ T51] ? __pfx_compaction_alloc+0x10/0x10 > [ 218.454705][ T51] ? __pfx_compaction_free+0x10/0x10 > [ 218.454713][ T51] migrate_pages+0x846/0xe30 > [ 218.454720][ T51] ? __pfx_compaction_alloc+0x10/0x10 > [ 218.454726][ T51] ? __pfx_compaction_free+0x10/0x10 > [ 218.454733][ T51] ? __pfx_buffer_migrate_folio_norefs+0x10/0x10 > [ 218.454740][ T51] ? __pfx_migrate_pages+0x10/0x10 > [ 218.454748][ T51] ? isolate_migratepages+0x32d/0xbd0 > [ 218.454757][ T51] compact_zone+0x9e1/0x1680 > [ 218.454767][ T51] ? __pfx_compact_zone+0x10/0x10 > [ 218.454774][ T51] ? _raw_spin_lock_irqsave+0x87/0xe0 > [ 218.454780][ T51] ? __pfx__raw_spin_lock_irqsave+0x10/0x10 > [ 218.454788][ T51] compact_node+0x159/0x250 > [ 218.454795][ T51] ? __pfx_compact_node+0x10/0x10 > [ 218.454807][ T51] ? __pfx_extfrag_for_order+0x10/0x10 > [ 218.454814][ T51] ? __pfx_mutex_unlock+0x10/0x10 > [ 218.454822][ T51] ? finish_wait+0xd1/0x280 > [ 218.454831][ T51] kcompactd+0x582/0x960 > [ 218.454839][ T51] ? __pfx_kcompactd+0x10/0x10 > [ 218.454846][ T51] ? _raw_spin_lock_irqsave+0x87/0xe0 > [ 218.454852][ T51] ? __pfx__raw_spin_lock_irqsave+0x10/0x10 > [ 218.454858][ T51] ? __pfx_autoremove_wake_function+0x10/0x10 > [ 218.454867][ T51] ? __kthread_parkme+0xba/0x1e0 > [ 218.454874][ T51] ? __pfx_kcompactd+0x10/0x10 > [ 218.454880][ T51] kthread+0x3a1/0x770 > [ 218.454887][ T51] ? __pfx_kthread+0x10/0x10 > [ 218.454895][ T51] ? __pfx_kthread+0x10/0x10 > [ 218.454902][ T51] ret_from_fork+0x30/0x70 > [ 218.454910][ T51] ? __pfx_kthread+0x10/0x10 > [ 218.454915][ T51] ret_from_fork_asm+0x1a/0x30 > [ 218.454924][ T51] </TASK> > > So the only thing I can think of the patch which the patch can do is > push more large folios to be used and so compaction can be a secondary > effect which managed to trigger another mm issue. I know there was a > recent migration fix but I can't see the relationship at all either. > > Luis ^ permalink raw reply [flat|nested] 31+ messages in thread
end of thread, other threads:[~2025-04-08 19:13 UTC | newest]
Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <202503101536.27099c77-lkp@intel.com>
[not found] ` <20250311-testphasen-behelfen-09b950bbecbf@brauner>
[not found] ` <Z9kEdPLNT8SOyOQT@xsang-OptiPlex-9020>
2025-03-18 8:15 ` [linux-next:master] [block/bdev] 3c20917120: BUG:sleeping_function_called_from_invalid_context_at_mm/util.c Luis Chamberlain
2025-03-18 14:37 ` Matthew Wilcox
2025-03-18 23:17 ` Luis Chamberlain
2025-03-19 2:58 ` Matthew Wilcox
2025-03-19 16:55 ` Luis Chamberlain
2025-03-19 19:16 ` Luis Chamberlain
2025-03-19 19:24 ` Matthew Wilcox
2025-03-20 12:11 ` Luis Chamberlain
2025-03-20 12:18 ` Luis Chamberlain
2025-03-22 23:14 ` Johannes Weiner
2025-03-23 1:02 ` Luis Chamberlain
2025-03-23 7:07 ` Luis Chamberlain
2025-03-25 6:52 ` Oliver Sang
2025-03-28 1:44 ` Luis Chamberlain
2025-03-28 4:21 ` Luis Chamberlain
2025-03-28 9:47 ` Luis Chamberlain
2025-03-28 19:09 ` Luis Chamberlain
2025-03-29 0:08 ` Luis Chamberlain
2025-03-29 1:06 ` Luis Chamberlain
2025-03-31 7:45 ` Sebastian Andrzej Siewior
2025-04-08 16:43 ` Darrick J. Wong
2025-04-08 17:06 ` Luis Chamberlain
2025-04-08 17:24 ` Luis Chamberlain
2025-04-08 17:48 ` Darrick J. Wong
2025-04-08 17:51 ` Matthew Wilcox
2025-04-08 18:02 ` Darrick J. Wong
2025-04-08 18:51 ` Matthew Wilcox
2025-04-08 19:13 ` Luis Chamberlain
2025-04-08 19:13 ` Luis Chamberlain
2025-04-08 18:06 ` Luis Chamberlain
2025-03-20 1:24 ` Lai, Yi
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox