linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Lance Yang <lance.yang@linux.dev>
To: willy@infradead.org
Cc: syzbot+bf6e6a6ca143afea5ca2@syzkaller.appspotmail.com,
	Liam.Howlett@oracle.com, akpm@linux-foundation.org,
	baohua@kernel.org, baolin.wang@linux.alibaba.com,
	david@kernel.org, dev.jain@arm.com, lance.yang@linux.dev,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	lorenzo.stoakes@oracle.com, npache@redhat.com,
	ryan.roberts@arm.com, syzkaller-bugs@googlegroups.com,
	ziy@nvidia.com
Subject: Re: [syzbot] [mm?] kernel BUG in hpage_collapse_scan_file (2)
Date: Sun, 25 Jan 2026 20:10:01 +0800	[thread overview]
Message-ID: <20260125121001.32733-1-lance.yang@linux.dev> (raw)
In-Reply-To: <69757ea0.a00a0220.33ccc7.0017.GAE@google.com>

Ccing Willy.

On Sat, 24 Jan 2026 18:23:28 -0800, syzbot wrote:
> Hello,
> 
> syzbot found the following issue on:
> 
> HEAD commit:    ca3a02fda4da Add linux-next specific files for 20260123
> git tree:       linux-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=10c42452580000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=10f2b64f8f12b9a4
> dashboard link: https://syzkaller.appspot.com/bug?extid=bf6e6a6ca143afea5ca2
> compiler:       Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=17f7cbfa580000
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=112d405a580000
> 
> Downloadable assets:
> disk image: https://storage.googleapis.com/syzbot-assets/291ebca63a31/disk-ca3a02fd.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/b2112a214b54/vmlinux-ca3a02fd.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/77d1ae437e07/bzImage-ca3a02fd.xz
> 
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+bf6e6a6ca143afea5ca2@syzkaller.appspotmail.com
> 
> node ffff888148816ec0 offset 0 parent ffff888148817700 shift 0 count 64 values 0 array ffff88807be6b0f0 list ffff888148816ed8 ffff888148816ed8 marks 0 0 0
> ------------[ cut here ]------------
> kernel BUG at ./include/linux/xarray.h:1441!
> Oops: invalid opcode: 0000 [#1] SMP KASAN PTI
> CPU: 0 UID: 0 PID: 6017 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full) 
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/13/2026
> RIP: 0010:XAS_INVALID include/linux/xarray.h:1441 [inline]

Seems like that is:

```
static inline struct xa_state *XAS_INVALID(struct xa_state *xas)
{
	XA_NODE_BUG_ON(xas->xa_node, xas_valid(xas));
	return xas;
}
```

Which was added by commit 43b00759f21b (not land upstream yet):

```
commit 43b00759f21b10142094d1ae5ff65cbb368953a3
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Sun Dec 14 10:53:31 2025 -0500

    XArray: Add extra debugging check to xas_lock and friends

    While tracking down a recent bug, we discovered somewhere that had
    forgotten to call xas_reset() before calling xas_lock().  Add a debug
    check to be sure that doesn't happen in future and fix all the places in
    the test suite which were carelessly doing just this.

    Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
```

which catches places that forget to reset xas before locking.

> RIP: 0010:collapse_file mm/khugepaged.c:2041 [inline]

Yeah, maybe it caught a bug in collapse_file() ...

When we lock again with xas_lock_irq(), xas->xa_node is still pointing
at a node from the earlier xas_load(), so the BUG_ON fires, IIUC.

Fix it by calling xas_set() before xas_lock_irq() to reset the state.
And one spot in rollback doesn't actually need xas at all, just changed
it to xa_lock_irq() directly.

---8<---
commit 2003255c52846ab10cad6c2e57cda4d17dddadbe
Author: Lance Yang <lance.yang@linux.dev>
Date:   Sun Jan 25 19:37:56 2026 +0800

    HACK

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index fba6aea5bea6..3656ae491385 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -2038,6 +2038,7 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
 			try_to_unmap(folio,
 					TTU_IGNORE_MLOCK | TTU_BATCH_FLUSH);

+		xas_set(&xas, index);
 		xas_lock_irq(&xas);

 		VM_BUG_ON_FOLIO(folio != xa_load(xas.xa, index), folio);
@@ -2140,9 +2141,8 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
 		int nr_none_check = 0;

 		i_mmap_lock_read(mapping);
-		xas_lock_irq(&xas);
-
 		xas_set(&xas, start);
+		xas_lock_irq(&xas);
 		for (index = start; index < end; index++) {
 			if (!xas_next(&xas)) {
 				xas_store(&xas, XA_RETRY_ENTRY);
@@ -2192,6 +2192,7 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
 			goto rollback;
 		}
 	} else {
+		xas_set(&xas, start);
 		xas_lock_irq(&xas);
 	}

@@ -2250,9 +2251,9 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
 rollback:
 	/* Something went wrong: roll back page cache changes */
 	if (nr_none) {
-		xas_lock_irq(&xas);
+		xa_lock_irq(&mapping->i_pages);
 		mapping->nrpages -= nr_none;
-		xas_unlock_irq(&xas);
+		xa_unlock_irq(&mapping->i_pages);
 		shmem_uncharge(mapping->host, nr_none);
 	}
---

Tested with the syzbot reproducer[1], no more crashes :)

[1] https://syzkaller.appspot.com/x/repro.c?x=112d405a580000

Cheers,
Lance

[...]


  reply	other threads:[~2026-01-25 12:10 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-25  2:23 syzbot
2026-01-25 12:10 ` Lance Yang [this message]
2026-01-25 18:13   ` David Hildenbrand (Red Hat)
2026-01-26  1:54     ` Lance Yang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260125121001.32733-1-lance.yang@linux.dev \
    --to=lance.yang@linux.dev \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=david@kernel.org \
    --cc=dev.jain@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=npache@redhat.com \
    --cc=ryan.roberts@arm.com \
    --cc=syzbot+bf6e6a6ca143afea5ca2@syzkaller.appspotmail.com \
    --cc=syzkaller-bugs@googlegroups.com \
    --cc=willy@infradead.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox