On Fri, 2 Dec 2022 21:57:09 +0000 Matthew Wilcox wrote: > On Fri, Dec 02, 2022 at 04:58:42PM +0000, Matthew Wilcox wrote: > > Landing on 793917d997df makes a lot more sense. That's where we > > actually start using large folios. It doesn't really help narrow > > down the problem. I have an idea for what it might be; patch to > > try will follow. But I'll need feedback by email. > > This will give us a bit more information when it does happen. > Further patch to catch it earlier will come "soon". > > diff --git a/lib/xarray.c b/lib/xarray.c > index 6f47f6375808..b358b4e1dac6 100644 > --- a/lib/xarray.c > +++ b/lib/xarray.c > @@ -6,6 +6,7 @@ > * Author: Matthew Wilcox > */ > > +#define XA_DEBUG > #include > #include > #include > @@ -207,6 +208,12 @@ static void *xas_descend(struct xa_state *xas, struct xa_node *node) > if (xa_is_sibling(entry)) { > offset = xa_to_sibling(entry); > entry = xa_entry(xas->xa, node, offset); > + > + if (xa_is_sibling(entry)) { > + printk("***BAD SIBLING*** index %ld offset %d\n", > + xas->xa_index, offset); > + xa_dump_node(node); > + } > } > > xas->xa_offset = offset; here is the crash with your patch (full dmesg in attachement): [ 876.422920] ***BAD SIBLING*** index 104046 offset 40 [ 876.422922] node ffffa0f3a6367b50 offset 25 parent ffffa0f37bd7a480 shift 0 count 64 values 8 array ffffa0f1fab08dc0 list ffffa0f3a6367b68 ffffa0f3a6367b68 marks 0 0 0 [ 876.422926] BUG: kernel NULL pointer dereference, address: 0000000000000082 [ 876.422928] #PF: supervisor read access in kernel mode [ 876.422929] #PF: error_code(0x0000) - not-present page [ 876.422930] PGD 0 P4D 0 [ 876.422931] Oops: 0000 [#1] PREEMPT SMP NOPTI [ 876.422933] CPU: 19 PID: 8313 Comm: deluge-gtk Not tainted 5.17.0-rc4_ap_test-00163-g793917d997df-dirty #3 [ 876.422934] Hardware name: Micro-Star International Co., Ltd. MS-7C35/MEG X570 UNIFY (MS-7C35), BIOS A.C3 03/15/2022 [ 876.422935] RIP: 0010:next_uptodate_page+0x40/0x1e0 [ 876.422939] Code: 0f 84 2f 01 00 00 48 81 ff 06 04 00 00 0f 84 a3 00 00 00 48 81 ff 02 04 00 00 0f 84 22 01 00 00 40 f6 c7 01 0f 85 8c 00 00 00 <48> 8b 07 a8 01 0f 85 81 00 00 00 8b 47 34 85 c0 74 7a 8d 50 01 4c [ 876.422940] RSP: 0000:ffffbf0704aefce8 EFLAGS: 00010246 [ 876.422942] RAX: 0000000000000082 RBX: ffffbf0704aefd40 RCX: 000000000001967d [ 876.422942] RDX: ffffbf0704aefd40 RSI: ffffa0f1fab08db8 RDI: 0000000000000082 [ 876.422943] RBP: ffffa0f1fab08db8 R08: 00000000ffffdfff R09: 00000000ffffdfff [ 876.422944] R10: ffffffff9ee72dc0 R11: ffffffff9ee72dc0 R12: 000000000001967d [ 876.422944] R13: ffffa0f3cc1c06c0 R14: ffffa0f1fab08db8 R15: 000000000001966e [ 876.422945] FS: 00007f8971ffb6c0(0000) GS:ffffa0f87eec0000(0000) knlGS:0000000000000000 [ 876.422946] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 876.422947] CR2: 0000000000000082 CR3: 000000010fada000 CR4: 0000000000750ee0 [ 876.422947] PKRU: 55555554 [ 876.422948] Call Trace: [ 876.422949] [ 876.422951] filemap_map_pages+0xa3/0x570 [ 876.422954] xfs_filemap_map_pages+0x3f/0x60 [ 876.422957] __handle_mm_fault+0xfbe/0x15c0 [ 876.422959] ? __hrtimer_init+0xd0/0xd0 [ 876.422963] handle_mm_fault+0xbc/0x280 [ 876.422964] do_user_addr_fault+0x1bc/0x640 [ 876.422968] exc_page_fault+0x60/0x140 [ 876.422971] ? asm_exc_page_fault+0x8/0x30 [ 876.422973] asm_exc_page_fault+0x1e/0x30 [ 876.422975] RIP: 0033:0x7f89ab02e409 [ 876.422977] Code: 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 2e 0f 1f 84 00 00 00 00 00 48 89 f8 48 83 fa 20 72 27 fe 6f 06 48 83 fa 40 0f 87 a9 00 00 00 c5 fe 6f 4c 16 e0 c5 fe [ 876.422978] RSP: 002b:00007f8971ffa908 EFLAGS: 00010202 [ 876.422979] RAX: 00007f895c008900 RBX: 0000000000000000 RCX: 00007f8971ffaa90 [ 876.422980] RDX: 0000000000004000 RSI: 00007f60e9fd2b7d RDI: 00007f895c008900 [ 876.422981] RBP: 00007f8971ffa928 R08: 00000000638a7d7a R09: 0000000000000000 [ 876.422981] R10: 0000000000000008 R11: 0000000000000246 R12: 00007f895c094a10 [ 876.422982] R13: 00007f89640016d0 R14: 0000000019670b7d R15: 0000000000004000 [ 876.422984] [ 876.422984] Modules linked in: overlay xt_addrtype amdgpu drm_ttm_helper ttm gpu_sched drm_kms_helper backlight iwlmvm syscopyarea mac80211 sysfillrect libarc4 sysimgblt fb_sys_fops iwlwifi i2c_piix4 cfg80211 k10temp fuse configfs efivarfs [ 876.422994] CR2: 0000000000000082 [ 876.422995] ---[ end trace 0000000000000000 ]--- [ 876.422996] RIP: 0010:next_uptodate_page+0x40/0x1e0 [ 876.422998] Code: 0f 84 2f 01 00 00 48 81 ff 06 04 00 00 0f 84 a3 00 00 00 48 81 ff 02 04 00 00 0f 84 22 01 00 00 40 f6 c7 01 0f 85 8c 00 00 00 <48> 8b 07 a8 01 0f 85 81 00 00 00 8b 47 34 85 c0 74 7a 8d 50 01 4c [ 876.422999] RSP: 0000:ffffbf0704aefce8 EFLAGS: 00010246 [ 876.423000] RAX: 0000000000000082 RBX: ffffbf0704aefd40 RCX: 000000000001967d [ 876.423001] RDX: ffffbf0704aefd40 RSI: ffffa0f1fab08db8 RDI: 0000000000000082 [ 876.423002] RBP: ffffa0f1fab08db8 R08: 00000000ffffdfff R09: 00000000ffffdfff [ 876.423003] R10: ffffffff9ee72dc0 R11: ffffffff9ee72dc0 R12: 000000000001967d [ 876.423003] R13: ffffa0f3cc1c06c0 R14: ffffa0f1fab08db8 R15: 000000000001966e [ 876.423004] FS: 00007f8971ffb6c0(0000) GS:ffffa0f87eec0000(0000) knlGS:0000000000000000 [ 876.423005] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 876.423006] CR2: 0000000000000082 CR3: 000000010fada000 CR4: 0000000000750ee0 [ 876.423007] PKRU: 55555554 i will, of course, test further patches to help with this issue -- Mikhail Pletnev