linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [syzbot] [mm?] kernel BUG in hpage_collapse_scan_file (2)
@ 2026-01-25  2:23 syzbot
  2026-01-25 12:10 ` Lance Yang
  0 siblings, 1 reply; 4+ messages in thread
From: syzbot @ 2026-01-25  2:23 UTC (permalink / raw)
  To: Liam.Howlett, akpm, baohua, baolin.wang, david, dev.jain,
	lance.yang, linux-kernel, linux-mm, lorenzo.stoakes, npache,
	ryan.roberts, syzkaller-bugs, ziy

Hello,

syzbot found the following issue on:

HEAD commit:    ca3a02fda4da Add linux-next specific files for 20260123
git tree:       linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=10c42452580000
kernel config:  https://syzkaller.appspot.com/x/.config?x=10f2b64f8f12b9a4
dashboard link: https://syzkaller.appspot.com/bug?extid=bf6e6a6ca143afea5ca2
compiler:       Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=17f7cbfa580000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=112d405a580000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/291ebca63a31/disk-ca3a02fd.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/b2112a214b54/vmlinux-ca3a02fd.xz
kernel image: https://storage.googleapis.com/syzbot-assets/77d1ae437e07/bzImage-ca3a02fd.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+bf6e6a6ca143afea5ca2@syzkaller.appspotmail.com

node ffff888148816ec0 offset 0 parent ffff888148817700 shift 0 count 64 values 0 array ffff88807be6b0f0 list ffff888148816ed8 ffff888148816ed8 marks 0 0 0
------------[ cut here ]------------
kernel BUG at ./include/linux/xarray.h:1441!
Oops: invalid opcode: 0000 [#1] SMP KASAN PTI
CPU: 0 UID: 0 PID: 6017 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full) 
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/13/2026
RIP: 0010:XAS_INVALID include/linux/xarray.h:1441 [inline]
RIP: 0010:collapse_file mm/khugepaged.c:2041 [inline]
RIP: 0010:hpage_collapse_scan_file+0x4e0c/0x50e0 mm/khugepaged.c:2387
Code: ff 48 89 df 48 c7 c6 c0 8c bc 8b e8 ee 6c f6 fe 90 0f 0b 48 85 db 0f 84 29 01 00 00 e8 bd 34 91 ff 48 89 df e8 f5 c4 4b 09 90 <0f> 0b e8 ad 34 91 ff 48 89 df 48 c7 c6 c0 8c bc 8b e8 be 6c f6 fe
RSP: 0018:ffffc9000422f120 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff888148816ec0 RCX: 6c90c8cc739bf400
RDX: 0000000000000000 RSI: 0000000080000000 RDI: 0000000000000000
RBP: ffffc9000422f428 R08: ffffc9000422eea7 R09: 1ffff92000845dd4
R10: dffffc0000000000 R11: fffff52000845dd5 R12: 00000003fffffffc
R13: dffffc0000000000 R14: 0000000000000000 R15: 0000000000000000
FS:  0000555592982500(0000) GS:ffff8881256ef000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000001b30363fff CR3: 0000000031994000 CR4: 00000000003526f0
Call Trace:
 <TASK>
 madvise_collapse+0x42f/0xb30 mm/khugepaged.c:2817
 madvise_vma_behavior+0x10ad/0x43f0 mm/madvise.c:1372
 madvise_walk_vmas+0x57a/0xaf0 mm/madvise.c:1721
 madvise_do_behavior+0x386/0x540 mm/madvise.c:1937
 do_madvise+0x1fa/0x2e0 mm/madvise.c:2030
 __do_sys_madvise mm/madvise.c:2039 [inline]
 __se_sys_madvise mm/madvise.c:2037 [inline]
 __x64_sys_madvise+0xa6/0xc0 mm/madvise.c:2037
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xe2/0xf80 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f948ad9acb9
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffd1ef477c8 EFLAGS: 00000246 ORIG_RAX: 000000000000001c
RAX: ffffffffffffffda RBX: 00007f948b015fa0 RCX: 00007f948ad9acb9
RDX: 0000000000000019 RSI: 0000000000600003 RDI: 0000200000000000
RBP: 00007f948ae08bf7 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f948b015fac R14: 00007f948b015fa0 R15: 00007f948b015fa0
 </TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:XAS_INVALID include/linux/xarray.h:1441 [inline]
RIP: 0010:collapse_file mm/khugepaged.c:2041 [inline]
RIP: 0010:hpage_collapse_scan_file+0x4e0c/0x50e0 mm/khugepaged.c:2387
Code: ff 48 89 df 48 c7 c6 c0 8c bc 8b e8 ee 6c f6 fe 90 0f 0b 48 85 db 0f 84 29 01 00 00 e8 bd 34 91 ff 48 89 df e8 f5 c4 4b 09 90 <0f> 0b e8 ad 34 91 ff 48 89 df 48 c7 c6 c0 8c bc 8b e8 be 6c f6 fe
RSP: 0018:ffffc9000422f120 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff888148816ec0 RCX: 6c90c8cc739bf400
RDX: 0000000000000000 RSI: 0000000080000000 RDI: 0000000000000000
RBP: ffffc9000422f428 R08: ffffc9000422eea7 R09: 1ffff92000845dd4
R10: dffffc0000000000 R11: fffff52000845dd5 R12: 00000003fffffffc
R13: dffffc0000000000 R14: 0000000000000000 R15: 0000000000000000
FS:  0000555592982500(0000) GS:ffff8881256ef000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000c0001002a0 CR3: 0000000031994000 CR4: 00000000003526f0


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [syzbot] [mm?] kernel BUG in hpage_collapse_scan_file (2)
  2026-01-25  2:23 [syzbot] [mm?] kernel BUG in hpage_collapse_scan_file (2) syzbot
@ 2026-01-25 12:10 ` Lance Yang
  2026-01-25 18:13   ` David Hildenbrand (Red Hat)
  0 siblings, 1 reply; 4+ messages in thread
From: Lance Yang @ 2026-01-25 12:10 UTC (permalink / raw)
  To: willy
  Cc: syzbot+bf6e6a6ca143afea5ca2, Liam.Howlett, akpm, baohua,
	baolin.wang, david, dev.jain, lance.yang, linux-kernel, linux-mm,
	lorenzo.stoakes, npache, ryan.roberts, syzkaller-bugs, ziy

Ccing Willy.

On Sat, 24 Jan 2026 18:23:28 -0800, syzbot wrote:
> Hello,
> 
> syzbot found the following issue on:
> 
> HEAD commit:    ca3a02fda4da Add linux-next specific files for 20260123
> git tree:       linux-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=10c42452580000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=10f2b64f8f12b9a4
> dashboard link: https://syzkaller.appspot.com/bug?extid=bf6e6a6ca143afea5ca2
> compiler:       Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=17f7cbfa580000
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=112d405a580000
> 
> Downloadable assets:
> disk image: https://storage.googleapis.com/syzbot-assets/291ebca63a31/disk-ca3a02fd.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/b2112a214b54/vmlinux-ca3a02fd.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/77d1ae437e07/bzImage-ca3a02fd.xz
> 
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+bf6e6a6ca143afea5ca2@syzkaller.appspotmail.com
> 
> node ffff888148816ec0 offset 0 parent ffff888148817700 shift 0 count 64 values 0 array ffff88807be6b0f0 list ffff888148816ed8 ffff888148816ed8 marks 0 0 0
> ------------[ cut here ]------------
> kernel BUG at ./include/linux/xarray.h:1441!
> Oops: invalid opcode: 0000 [#1] SMP KASAN PTI
> CPU: 0 UID: 0 PID: 6017 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full) 
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/13/2026
> RIP: 0010:XAS_INVALID include/linux/xarray.h:1441 [inline]

Seems like that is:

```
static inline struct xa_state *XAS_INVALID(struct xa_state *xas)
{
	XA_NODE_BUG_ON(xas->xa_node, xas_valid(xas));
	return xas;
}
```

Which was added by commit 43b00759f21b (not land upstream yet):

```
commit 43b00759f21b10142094d1ae5ff65cbb368953a3
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Sun Dec 14 10:53:31 2025 -0500

    XArray: Add extra debugging check to xas_lock and friends

    While tracking down a recent bug, we discovered somewhere that had
    forgotten to call xas_reset() before calling xas_lock().  Add a debug
    check to be sure that doesn't happen in future and fix all the places in
    the test suite which were carelessly doing just this.

    Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
```

which catches places that forget to reset xas before locking.

> RIP: 0010:collapse_file mm/khugepaged.c:2041 [inline]

Yeah, maybe it caught a bug in collapse_file() ...

When we lock again with xas_lock_irq(), xas->xa_node is still pointing
at a node from the earlier xas_load(), so the BUG_ON fires, IIUC.

Fix it by calling xas_set() before xas_lock_irq() to reset the state.
And one spot in rollback doesn't actually need xas at all, just changed
it to xa_lock_irq() directly.

---8<---
commit 2003255c52846ab10cad6c2e57cda4d17dddadbe
Author: Lance Yang <lance.yang@linux.dev>
Date:   Sun Jan 25 19:37:56 2026 +0800

    HACK

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index fba6aea5bea6..3656ae491385 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -2038,6 +2038,7 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
 			try_to_unmap(folio,
 					TTU_IGNORE_MLOCK | TTU_BATCH_FLUSH);

+		xas_set(&xas, index);
 		xas_lock_irq(&xas);

 		VM_BUG_ON_FOLIO(folio != xa_load(xas.xa, index), folio);
@@ -2140,9 +2141,8 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
 		int nr_none_check = 0;

 		i_mmap_lock_read(mapping);
-		xas_lock_irq(&xas);
-
 		xas_set(&xas, start);
+		xas_lock_irq(&xas);
 		for (index = start; index < end; index++) {
 			if (!xas_next(&xas)) {
 				xas_store(&xas, XA_RETRY_ENTRY);
@@ -2192,6 +2192,7 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
 			goto rollback;
 		}
 	} else {
+		xas_set(&xas, start);
 		xas_lock_irq(&xas);
 	}

@@ -2250,9 +2251,9 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
 rollback:
 	/* Something went wrong: roll back page cache changes */
 	if (nr_none) {
-		xas_lock_irq(&xas);
+		xa_lock_irq(&mapping->i_pages);
 		mapping->nrpages -= nr_none;
-		xas_unlock_irq(&xas);
+		xa_unlock_irq(&mapping->i_pages);
 		shmem_uncharge(mapping->host, nr_none);
 	}
---

Tested with the syzbot reproducer[1], no more crashes :)

[1] https://syzkaller.appspot.com/x/repro.c?x=112d405a580000

Cheers,
Lance

[...]


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [syzbot] [mm?] kernel BUG in hpage_collapse_scan_file (2)
  2026-01-25 12:10 ` Lance Yang
@ 2026-01-25 18:13   ` David Hildenbrand (Red Hat)
  2026-01-26  1:54     ` Lance Yang
  0 siblings, 1 reply; 4+ messages in thread
From: David Hildenbrand (Red Hat) @ 2026-01-25 18:13 UTC (permalink / raw)
  To: Lance Yang, willy
  Cc: syzbot+bf6e6a6ca143afea5ca2, Liam.Howlett, akpm, baohua,
	baolin.wang, dev.jain, linux-kernel, linux-mm, lorenzo.stoakes,
	npache, ryan.roberts, syzkaller-bugs, ziy

On 1/25/26 13:10, Lance Yang wrote:
> Ccing Willy.
> 
> On Sat, 24 Jan 2026 18:23:28 -0800, syzbot wrote:
>> Hello,
>>
>> syzbot found the following issue on:
>>
>> HEAD commit:    ca3a02fda4da Add linux-next specific files for 20260123
>> git tree:       linux-next
>> console output: https://syzkaller.appspot.com/x/log.txt?x=10c42452580000
>> kernel config:  https://syzkaller.appspot.com/x/.config?x=10f2b64f8f12b9a4
>> dashboard link: https://syzkaller.appspot.com/bug?extid=bf6e6a6ca143afea5ca2
>> compiler:       Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8
>> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=17f7cbfa580000
>> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=112d405a580000
>>
>> Downloadable assets:
>> disk image: https://storage.googleapis.com/syzbot-assets/291ebca63a31/disk-ca3a02fd.raw.xz
>> vmlinux: https://storage.googleapis.com/syzbot-assets/b2112a214b54/vmlinux-ca3a02fd.xz
>> kernel image: https://storage.googleapis.com/syzbot-assets/77d1ae437e07/bzImage-ca3a02fd.xz
>>
>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
>> Reported-by: syzbot+bf6e6a6ca143afea5ca2@syzkaller.appspotmail.com
>>
>> node ffff888148816ec0 offset 0 parent ffff888148817700 shift 0 count 64 values 0 array ffff88807be6b0f0 list ffff888148816ed8 ffff888148816ed8 marks 0 0 0
>> ------------[ cut here ]------------
>> kernel BUG at ./include/linux/xarray.h:1441!
>> Oops: invalid opcode: 0000 [#1] SMP KASAN PTI
>> CPU: 0 UID: 0 PID: 6017 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full)
>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/13/2026
>> RIP: 0010:XAS_INVALID include/linux/xarray.h:1441 [inline]
> 
> Seems like that is:
> 
> ```
> static inline struct xa_state *XAS_INVALID(struct xa_state *xas)
> {
> 	XA_NODE_BUG_ON(xas->xa_node, xas_valid(xas));
> 	return xas;
> }
> ```

I think there was recently already a discussion about this.

See

https://lore.kernel.org/linux-mm/aVvz3tYdu49TGkjI@mozart.vkv.me/


And where Willy said that likely it needs more thought:

https://lore.kernel.org/linux-mm/aVwm3MQ_ZDa_kU8c@casper.infradead.org/

-- 
Cheers

David


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [syzbot] [mm?] kernel BUG in hpage_collapse_scan_file (2)
  2026-01-25 18:13   ` David Hildenbrand (Red Hat)
@ 2026-01-26  1:54     ` Lance Yang
  0 siblings, 0 replies; 4+ messages in thread
From: Lance Yang @ 2026-01-26  1:54 UTC (permalink / raw)
  To: David Hildenbrand (Red Hat), willy
  Cc: syzbot+bf6e6a6ca143afea5ca2, Liam.Howlett, akpm, baohua,
	baolin.wang, dev.jain, linux-kernel, linux-mm, lorenzo.stoakes,
	npache, ryan.roberts, syzkaller-bugs, ziy



On 2026/1/26 02:13, David Hildenbrand (Red Hat) wrote:
> On 1/25/26 13:10, Lance Yang wrote:
>> Ccing Willy.
>>
>> On Sat, 24 Jan 2026 18:23:28 -0800, syzbot wrote:
>>> Hello,
>>>
>>> syzbot found the following issue on:
>>>
>>> HEAD commit:    ca3a02fda4da Add linux-next specific files for 20260123
>>> git tree:       linux-next
>>> console output: https://syzkaller.appspot.com/x/log.txt?x=10c42452580000
>>> kernel config:  https://syzkaller.appspot.com/x/.config? 
>>> x=10f2b64f8f12b9a4
>>> dashboard link: https://syzkaller.appspot.com/bug? 
>>> extid=bf6e6a6ca143afea5ca2
>>> compiler:       Debian clang version 21.1.8 (+ 
>>> +20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 
>>> 21.1.8
>>> syz repro:      https://syzkaller.appspot.com/x/repro.syz? 
>>> x=17f7cbfa580000
>>> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=112d405a580000
>>>
>>> Downloadable assets:
>>> disk image: https://storage.googleapis.com/syzbot- 
>>> assets/291ebca63a31/disk-ca3a02fd.raw.xz
>>> vmlinux: https://storage.googleapis.com/syzbot-assets/b2112a214b54/ 
>>> vmlinux-ca3a02fd.xz
>>> kernel image: https://storage.googleapis.com/syzbot- 
>>> assets/77d1ae437e07/bzImage-ca3a02fd.xz
>>>
>>> IMPORTANT: if you fix the issue, please add the following tag to the 
>>> commit:
>>> Reported-by: syzbot+bf6e6a6ca143afea5ca2@syzkaller.appspotmail.com
>>>
>>> node ffff888148816ec0 offset 0 parent ffff888148817700 shift 0 count 
>>> 64 values 0 array ffff88807be6b0f0 list ffff888148816ed8 
>>> ffff888148816ed8 marks 0 0 0
>>> ------------[ cut here ]------------
>>> kernel BUG at ./include/linux/xarray.h:1441!
>>> Oops: invalid opcode: 0000 [#1] SMP KASAN PTI
>>> CPU: 0 UID: 0 PID: 6017 Comm: syz.0.17 Not tainted syzkaller #0 
>>> PREEMPT(full)
>>> Hardware name: Google Google Compute Engine/Google Compute Engine, 
>>> BIOS Google 01/13/2026
>>> RIP: 0010:XAS_INVALID include/linux/xarray.h:1441 [inline]
>>
>> Seems like that is:
>>
>> ```
>> static inline struct xa_state *XAS_INVALID(struct xa_state *xas)
>> {
>>     XA_NODE_BUG_ON(xas->xa_node, xas_valid(xas));
>>     return xas;
>> }
>> ```
> 
> I think there was recently already a discussion about this.
> 
> See
> 
> https://lore.kernel.org/linux-mm/aVvz3tYdu49TGkjI@mozart.vkv.me/
> 
> 
> And where Willy said that likely it needs more thought:
> 
> https://lore.kernel.org/linux-mm/aVwm3MQ_ZDa_kU8c@casper.infradead.org/

Ah, I see. Thanks for the pointer!


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-01-26  1:54 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-01-25  2:23 [syzbot] [mm?] kernel BUG in hpage_collapse_scan_file (2) syzbot
2026-01-25 12:10 ` Lance Yang
2026-01-25 18:13   ` David Hildenbrand (Red Hat)
2026-01-26  1:54     ` Lance Yang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox