Btrfs crash on generic/437 on x86

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Btrfs crash on generic/437 on x86_64
@ 2025-02-10  3:10 Qu Wenruo
  2025-02-10  3:31 ` Matthew Wilcox
  0 siblings, 1 reply; 4+ messages in thread
From: Qu Wenruo @ 2025-02-10  3:10 UTC (permalink / raw)
  To: Linux Memory Management List, linux-btrfs, fstests

Hi,

Normally such crash should not be worth reporting, and we would just
digging to fix it.

But this one is a little weird, we got a folio which is still mapped
during filemap_unaccount_folio().

I can reproduce it with default mount option with generic/437, so far 32
runs are enough to trigger it reliably.

And I'm not yet able to reproduce it on aarch64 (64K page size, 4K page
size so far).

I'm already trying to bisect the bug, it so far it's still reproducible
at 6.14-rc1.

Any advice/clue would be appreciated.

Dmesg:

[   58.305921] BTRFS info (device dm-0): using free-space-tree
[   58.319296] run fstests generic/437 at 2025-02-10 13:24:19
[   59.283069] BUG: Bad rss-counter state mm:0000000048578720
type:MM_FILEPAGES val:1
[   59.296485] page: refcount:3 mapcount:1 mapping:00000000828f872f
index:0x0 pfn:0x13ab4f
[   59.297223] memcg:ffff888105a32000
[   59.297533] aops:btrfs_aops [btrfs] ino:1031b
[   59.298188] flags:
0x2ffff800000002d(locked|referenced|uptodate|lru|node=0|zone=2|lastcpupid=0x1ffff)
[   59.298955] raw: 02ffff800000002d ffffea0004184948 ffffea0004c40c88
ffff888107c7a2b8
[   59.299607] raw: 0000000000000000 0000000000000000 0000000300000000
ffff888105a32000
[   59.300261] page dumped because: VM_BUG_ON_FOLIO(folio_mapped(folio))
[   59.300846] ------------[ cut here ]------------
[   59.301256] kernel BUG at mm/filemap.c:154!
[   59.301635] Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[   59.302144] CPU: 4 UID: 0 PID: 17354 Comm: umount Tainted: G
  OE      6.14.0-rc1-custom+ #211
[   59.302953] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
[   59.303447] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
unknown 02/02/2022
[   59.304291] RIP: 0010:filemap_unaccount_folio+0x153/0x1f0
[   59.305224] Code: b0 f0 00 00 00 e9 5d f6 00 00 48 c7 c6 80 1b 43 82
48 89 df e8 ae 89 04 00 0f 0b 48 c7 c6 10 d8 44 82 48 89 df e8 9d 89 04
00 <0f> 0b 48 8b 06 a8 40 74 4c 8b 43 50 e9 ce fe ff ff 48 c7 c6 80 1b
[   59.308807] RSP: 0018:ffffc90005387a18 EFLAGS: 00010046
[   59.309382] RAX: 0000000000000039 RBX: ffffea0004ead3c0 RCX:
0000000000000027
[   59.310313] RDX: 0000000000000000 RSI: 0000000000000001 RDI:
ffff888277c21880
[   59.311856] RBP: ffff888107c7a2b8 R08: ffffffff82cad0a8 R09:
00000000fffff000
[   59.312879] R10: ffffffff82c55100 R11: 6d75642065676170 R12:
0000000000000001
[   59.313607] R13: ffffffffffffffff R14: ffffc90005387ad8 R15:
ffff888107c7a2c0
[   59.314347] FS:  00007ff0455f2b80(0000) GS:ffff888277c00000(0000)
knlGS:0000000000000000
[   59.315159] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   59.315744] CR2: 000055e761f94f58 CR3: 0000000166a44000 CR4:
00000000000006f0
[   59.316476] Call Trace:
[   59.316749]  <TASK>
[   59.316986]  ? __die_body.cold+0x19/0x24
[   59.317401]  ? die+0x2e/0x50
[   59.317704]  ? do_trap+0xca/0x110
[   59.318062]  ? do_error_trap+0x6a/0x90
[   59.318464]  ? filemap_unaccount_folio+0x153/0x1f0
[   59.318990]  ? exc_invalid_op+0x50/0x70
[   59.319416]  ? filemap_unaccount_folio+0x153/0x1f0
[   59.319933]  ? asm_exc_invalid_op+0x1a/0x20
[   59.320395]  ? filemap_unaccount_folio+0x153/0x1f0
[   59.320918]  ? filemap_unaccount_folio+0x153/0x1f0
[   59.321408]  delete_from_page_cache_batch+0x95/0x3c0
[   59.321912]  truncate_inode_pages_range+0x142/0x570
[   59.322413]  btrfs_evict_inode+0x8b/0x390 [btrfs]
[   59.323055]  evict+0x14f/0x2d0
[   59.323374]  evict_inodes+0x19c/0x240
[   59.323748]  generic_shutdown_super+0x42/0x100
[   59.324203]  kill_anon_super+0x16/0x40
[   59.324588]  btrfs_kill_super+0x16/0x20 [btrfs]
[   59.325094]  deactivate_locked_super+0x33/0xb0
[   59.325564]  cleanup_mnt+0xba/0x150
[   59.325926]  task_work_run+0x5c/0x90
[   59.326299]  syscall_exit_to_user_mode+0x129/0x140
[   59.326781]  do_syscall_64+0x5b/0x120
[   59.327162]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   59.327676] RIP: 0033:0x7ff0457471cb
[   59.328056] Code: c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 f3 0f
1e fa 31 f6 e9 05 00 00 00 0f 1f 44 00 00 f3 0f 1e fa b8 a6 00 00 00 0f
05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 11 cb 0c 00 f7 d8
[   59.329814] RSP: 002b:00007ffc65f95d28 EFLAGS: 00000246 ORIG_RAX:
00000000000000a6
[   59.330475] RAX: 0000000000000000 RBX: 000055e761f87420 RCX:
00007ff0457471cb
[   59.331077] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
000055e761f8cb00
[   59.331690] RBP: 00007ffc65f95e00 R08: 000055e761f87010 R09:
0000000000000007
[   59.332297] R10: 0000000000000000 R11: 0000000000000246 R12:
000055e761f87528
[   59.332896] R13: 0000000000000000 R14: 000055e761f8cb00 R15:
000055e761f87830
[   59.333507]  </TASK>
[   59.333700] Modules linked in: crc32c_generic btrfs(OE) vfat fat
blake2b_generic xor zstd_compress iTCO_wdt iTCO_vendor_support psmouse
i2c_i801 pcspkr i2c_smbus lpc_ich intel_agp joydev intel_gtt mousedev
agpgart raid6_pq drm fuse loop qemu_fw_cfg ext4 crc16 mbcache jbd2
dm_mod virtio_net net_failover virtio_rng failover virtio_balloon
virtio_scsi virtio_console rng_core virtio_blk virtio_pci serio_raw
virtio_pci_legacy_dev usbhid virtio_pci_modern_dev [last unloaded: btrfs]
[   59.337352] Dumping ftrace buffer:
[   59.337715]    (ftrace buffer empty)
[   59.338098] ---[ end trace 0000000000000000 ]---
[   59.351979] pstore: backend (efi_pstore) writing error (-28)
[   59.352590] RIP: 0010:filemap_unaccount_folio+0x153/0x1f0
[   59.353182] Code: b0 f0 00 00 00 e9 5d f6 00 00 48 c7 c6 80 1b 43 82
48 89 df e8 ae 89 04 00 0f 0b 48 c7 c6 10 d8 44 82 48 89 df e8 9d 89 04
00 <0f> 0b 48 8b 06 a8 40 74 4c 8b 43 50 e9 ce fe ff ff 48 c7 c6 80 1b
[   59.355140] RSP: 0018:ffffc90005387a18 EFLAGS: 00010046
[   59.355702] RAX: 0000000000000039 RBX: ffffea0004ead3c0 RCX:
0000000000000027
[   59.356429] RDX: 0000000000000000 RSI: 0000000000000001 RDI:
ffff888277c21880
[   59.357131] RBP: ffff888107c7a2b8 R08: ffffffff82cad0a8 R09:
00000000fffff000
[   59.357847] R10: ffffffff82c55100 R11: 6d75642065676170 R12:
0000000000000001
[   59.358558] R13: ffffffffffffffff R14: ffffc90005387ad8 R15:
ffff888107c7a2c0
[   59.359274] FS:  00007ff0455f2b80(0000) GS:ffff888277c00000(0000)
knlGS:0000000000000000
[   59.360073] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   59.360676] CR2: 000055e761f94f58 CR3: 0000000166a44000 CR4:
00000000000006f0
[   59.361445] Kernel panic - not syncing: Fatal exception
[   59.362127] Dumping ftrace buffer:
[   59.362498]    (ftrace buffer empty)
[   59.362891] Kernel Offset: disabled
[   59.376221] Rebooting in 5 seconds..


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Btrfs crash on generic/437 on x86_64
  2025-02-10  3:10 Btrfs crash on generic/437 on x86_64 Qu Wenruo
@ 2025-02-10  3:31 ` Matthew Wilcox
  2025-02-10  3:43   ` Qu Wenruo
  0 siblings, 1 reply; 4+ messages in thread
From: Matthew Wilcox @ 2025-02-10  3:31 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Linux Memory Management List, linux-btrfs, fstests

On Mon, Feb 10, 2025 at 01:40:16PM +1030, Qu Wenruo wrote:
> But this one is a little weird, we got a folio which is still mapped
> during filemap_unaccount_folio().
> 
> I can reproduce it with default mount option with generic/437, so far 32
> runs are enough to trigger it reliably.
> 
> And I'm not yet able to reproduce it on aarch64 (64K page size, 4K page
> size so far).
> 
> I'm already trying to bisect the bug, it so far it's still reproducible
> at 6.14-rc1.
> 
> Any advice/clue would be appreciated.
> 
> Dmesg:
> 
> [   58.305921] BTRFS info (device dm-0): using free-space-tree
> [   58.319296] run fstests generic/437 at 2025-02-10 13:24:19
> [   59.283069] BUG: Bad rss-counter state mm:0000000048578720
> type:MM_FILEPAGES val:1

This is the original problem, all else is a consequence.  We're calling
check_mm() in __mmdrop() -- ie we're dropping the last refcount on a
task, and the counters show one page is still mapped.  And it's a file
page.  (now see below for the consequence)

> [   59.296485] page: refcount:3 mapcount:1 mapping:00000000828f872f
> index:0x0 pfn:0x13ab4f

This folio still has a mapcount of 1.

> [   59.297223] memcg:ffff888105a32000
> [   59.297533] aops:btrfs_aops [btrfs] ino:1031b
> [   59.298188] flags:
> 0x2ffff800000002d(locked|referenced|uptodate|lru|node=0|zone=2|lastcpupid=0x1ffff)
> [   59.298955] raw: 02ffff800000002d ffffea0004184948 ffffea0004c40c88
> ffff888107c7a2b8
> [   59.299607] raw: 0000000000000000 0000000000000000 0000000300000000
> ffff888105a32000
> [   59.300261] page dumped because: VM_BUG_ON_FOLIO(folio_mapped(folio))
> [   59.300846] ------------[ cut here ]------------
> [   59.301256] kernel BUG at mm/filemap.c:154!
> [   59.301635] Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> [   59.302144] CPU: 4 UID: 0 PID: 17354 Comm: umount Tainted: G
>  OE      6.14.0-rc1-custom+ #211
> [   59.302953] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
> [   59.303447] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
> unknown 02/02/2022
> [   59.304291] RIP: 0010:filemap_unaccount_folio+0x153/0x1f0
> [   59.305224] Code: b0 f0 00 00 00 e9 5d f6 00 00 48 c7 c6 80 1b 43 82
> 48 89 df e8 ae 89 04 00 0f 0b 48 c7 c6 10 d8 44 82 48 89 df e8 9d 89 04
> 00 <0f> 0b 48 8b 06 a8 40 74 4c 8b 43 50 e9 ce fe ff ff 48 c7 c6 80 1b
> [   59.308807] RSP: 0018:ffffc90005387a18 EFLAGS: 00010046
> [   59.309382] RAX: 0000000000000039 RBX: ffffea0004ead3c0 RCX:
> 0000000000000027
> [   59.310313] RDX: 0000000000000000 RSI: 0000000000000001 RDI:
> ffff888277c21880
> [   59.311856] RBP: ffff888107c7a2b8 R08: ffffffff82cad0a8 R09:
> 00000000fffff000
> [   59.312879] R10: ffffffff82c55100 R11: 6d75642065676170 R12:
> 0000000000000001
> [   59.313607] R13: ffffffffffffffff R14: ffffc90005387ad8 R15:
> ffff888107c7a2c0
> [   59.314347] FS:  00007ff0455f2b80(0000) GS:ffff888277c00000(0000)
> knlGS:0000000000000000
> [   59.315159] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   59.315744] CR2: 000055e761f94f58 CR3: 0000000166a44000 CR4:
> 00000000000006f0
> [   59.316476] Call Trace:
> [   59.316749]  <TASK>
> [   59.321408]  delete_from_page_cache_batch+0x95/0x3c0
> [   59.321912]  truncate_inode_pages_range+0x142/0x570
> [   59.322413]  btrfs_evict_inode+0x8b/0x390 [btrfs]

So we're evicting an inode, and we ask truncate_inode_pages_range()
to get rid of all the folios in the inode's mapping.  It walks the
rmap to find them all ... and doesn't find the one above because it's
exited already.

We need to figure out how we came to not unmap the page from the page
tables originally.  Looking through the merge log of the mm tree, my
suspicion falls on the following patchsets:

       - "synchronously scan and reclaim empty user PTE pages" from Qi Zheng
         addresses an issue where "huge" amounts of pte pagetables are
         accumulated:

       - "mm/vma: make more mmap logic userland testable" from Lorenzo
         Stoakes continues the work of moving vma-related code into the
         (relatively) new mm/vma.c

but of course it could be almost anything.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Btrfs crash on generic/437 on x86_64
  2025-02-10  3:31 ` Matthew Wilcox
@ 2025-02-10  3:43   ` Qu Wenruo
  2025-02-10  3:59     ` Matthew Wilcox
  0 siblings, 1 reply; 4+ messages in thread
From: Qu Wenruo @ 2025-02-10  3:43 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Linux Memory Management List, linux-btrfs, fstests



在 2025/2/10 14:01, Matthew Wilcox 写道:
> On Mon, Feb 10, 2025 at 01:40:16PM +1030, Qu Wenruo wrote:
>> But this one is a little weird, we got a folio which is still mapped
>> during filemap_unaccount_folio().
>>
>> I can reproduce it with default mount option with generic/437, so far 32
>> runs are enough to trigger it reliably.
>>
>> And I'm not yet able to reproduce it on aarch64 (64K page size, 4K page
>> size so far).
>>
>> I'm already trying to bisect the bug, it so far it's still reproducible
>> at 6.14-rc1.
>>
>> Any advice/clue would be appreciated.
>>
>> Dmesg:
>>
>> [   58.305921] BTRFS info (device dm-0): using free-space-tree
>> [   58.319296] run fstests generic/437 at 2025-02-10 13:24:19
>> [   59.283069] BUG: Bad rss-counter state mm:0000000048578720
>> type:MM_FILEPAGES val:1
>
> This is the original problem, all else is a consequence.  We're calling
> check_mm() in __mmdrop() -- ie we're dropping the last refcount on a
> task, and the counters show one page is still mapped.  And it's a file
> page.  (now see below for the consequence)
>
>> [   59.296485] page: refcount:3 mapcount:1 mapping:00000000828f872f
>> index:0x0 pfn:0x13ab4f
>
> This folio still has a mapcount of 1.
>
>> [   59.297223] memcg:ffff888105a32000
>> [   59.297533] aops:btrfs_aops [btrfs] ino:1031b
>> [   59.298188] flags:
>> 0x2ffff800000002d(locked|referenced|uptodate|lru|node=0|zone=2|lastcpupid=0x1ffff)
>> [   59.298955] raw: 02ffff800000002d ffffea0004184948 ffffea0004c40c88
>> ffff888107c7a2b8
>> [   59.299607] raw: 0000000000000000 0000000000000000 0000000300000000
>> ffff888105a32000
>> [   59.300261] page dumped because: VM_BUG_ON_FOLIO(folio_mapped(folio))
>> [   59.300846] ------------[ cut here ]------------
>> [   59.301256] kernel BUG at mm/filemap.c:154!
>> [   59.301635] Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
>> [   59.302144] CPU: 4 UID: 0 PID: 17354 Comm: umount Tainted: G
>>   OE      6.14.0-rc1-custom+ #211
>> [   59.302953] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
>> [   59.303447] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
>> unknown 02/02/2022
>> [   59.304291] RIP: 0010:filemap_unaccount_folio+0x153/0x1f0
>> [   59.305224] Code: b0 f0 00 00 00 e9 5d f6 00 00 48 c7 c6 80 1b 43 82
>> 48 89 df e8 ae 89 04 00 0f 0b 48 c7 c6 10 d8 44 82 48 89 df e8 9d 89 04
>> 00 <0f> 0b 48 8b 06 a8 40 74 4c 8b 43 50 e9 ce fe ff ff 48 c7 c6 80 1b
>> [   59.308807] RSP: 0018:ffffc90005387a18 EFLAGS: 00010046
>> [   59.309382] RAX: 0000000000000039 RBX: ffffea0004ead3c0 RCX:
>> 0000000000000027
>> [   59.310313] RDX: 0000000000000000 RSI: 0000000000000001 RDI:
>> ffff888277c21880
>> [   59.311856] RBP: ffff888107c7a2b8 R08: ffffffff82cad0a8 R09:
>> 00000000fffff000
>> [   59.312879] R10: ffffffff82c55100 R11: 6d75642065676170 R12:
>> 0000000000000001
>> [   59.313607] R13: ffffffffffffffff R14: ffffc90005387ad8 R15:
>> ffff888107c7a2c0
>> [   59.314347] FS:  00007ff0455f2b80(0000) GS:ffff888277c00000(0000)
>> knlGS:0000000000000000
>> [   59.315159] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [   59.315744] CR2: 000055e761f94f58 CR3: 0000000166a44000 CR4:
>> 00000000000006f0
>> [   59.316476] Call Trace:
>> [   59.316749]  <TASK>
>> [   59.321408]  delete_from_page_cache_batch+0x95/0x3c0
>> [   59.321912]  truncate_inode_pages_range+0x142/0x570
>> [   59.322413]  btrfs_evict_inode+0x8b/0x390 [btrfs]
>
> So we're evicting an inode, and we ask truncate_inode_pages_range()
> to get rid of all the folios in the inode's mapping.  It walks the
> rmap to find them all ... and doesn't find the one above because it's
> exited already.
>
> We need to figure out how we came to not unmap the page from the page
> tables originally.  Looking through the merge log of the mm tree, my
> suspicion falls on the following patchsets:
>
>         - "synchronously scan and reclaim empty user PTE pages" from Qi Zheng
>           addresses an issue where "huge" amounts of pte pagetables are
>           accumulated:
>
>         - "mm/vma: make more mmap logic userland testable" from Lorenzo
>           Stoakes continues the work of moving vma-related code into the
>           (relatively) new mm/vma.c
>
> but of course it could be almost anything.
>
Bisecting now, thankfully v6.13 seems good, so it's just in this merge
window.

Would report back with bisect result and log.

Thanks,
Qu


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Btrfs crash on generic/437 on x86_64
  2025-02-10  3:43   ` Qu Wenruo
@ 2025-02-10  3:59     ` Matthew Wilcox
  0 siblings, 0 replies; 4+ messages in thread
From: Matthew Wilcox @ 2025-02-10  3:59 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Linux Memory Management List, linux-btrfs, fstests

On Mon, Feb 10, 2025 at 02:13:53PM +1030, Qu Wenruo wrote:
> 在 2025/2/10 14:01, Matthew Wilcox 写道:
> > > [   58.305921] BTRFS info (device dm-0): using free-space-tree
> > > [   58.319296] run fstests generic/437 at 2025-02-10 13:24:19
> > > [   59.283069] BUG: Bad rss-counter state mm:0000000048578720
> > > type:MM_FILEPAGES val:1

> > We need to figure out how we came to not unmap the page from the page
> > tables originally.  Looking through the merge log of the mm tree, my
> > suspicion falls on the following patchsets:
> > 
> >         - "synchronously scan and reclaim empty user PTE pages" from Qi Zheng
> >           addresses an issue where "huge" amounts of pte pagetables are
> >           accumulated:

https://lore.kernel.org/linux-mm/2766D04E-5A04-4BF6-A2A3-5683A3054973@nvidia.com/
looks like a similar splat with the problem narrowed down to the reclaim
PTE pages patchset.


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-02-10  3:59 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-02-10  3:10 Btrfs crash on generic/437 on x86_64 Qu Wenruo
2025-02-10  3:31 ` Matthew Wilcox
2025-02-10  3:43   ` Qu Wenruo
2025-02-10  3:59     ` Matthew Wilcox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox