From: Brian Foster <bfoster@redhat.com>
To: "Lai, Yi" <yi1.lai@linux.intel.com>
Cc: linux-fsdevel@vger.kernel.org, linux-xfs@vger.kernel.org,
linux-mm@kvack.org, hch@infradead.org, djwong@kernel.org,
willy@infradead.org, brauner@kernel.org, yi1.lai@intel.com,
syzkaller-bugs@googlegroups.com
Subject: Re: [PATCH v5 5/7] xfs: fill dirty folios on zero range of unwritten mappings
Date: Fri, 5 Dec 2025 08:57:13 -0500 [thread overview]
Message-ID: <aTLkuabg_fP49Gjv@bfoster> (raw)
In-Reply-To: <aTJLAFyYBtW47r5Q@ly-workstation>
On Fri, Dec 05, 2025 at 11:01:20AM +0800, Lai, Yi wrote:
> On Fri, Oct 03, 2025 at 09:46:39AM -0400, Brian Foster wrote:
> > Use the iomap folio batch mechanism to select folios to zero on zero
> > range of unwritten mappings. Trim the resulting mapping if the batch
> > is filled (unlikely for current use cases) to distinguish between a
> > range to skip and one that requires another iteration due to a full
> > batch.
> >
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > Reviewed-by: Christoph Hellwig <hch@lst.de>
> > Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
> > ---
> > fs/xfs/xfs_iomap.c | 23 +++++++++++++++++++++++
> > 1 file changed, 23 insertions(+)
> >
> > diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> > index 6a05e04ad5ba..535bf3b8705d 100644
> > --- a/fs/xfs/xfs_iomap.c
> > +++ b/fs/xfs/xfs_iomap.c
> > @@ -1702,6 +1702,8 @@ xfs_buffered_write_iomap_begin(
> > struct iomap *iomap,
> > struct iomap *srcmap)
> > {
> > + struct iomap_iter *iter = container_of(iomap, struct iomap_iter,
> > + iomap);
> > struct xfs_inode *ip = XFS_I(inode);
> > struct xfs_mount *mp = ip->i_mount;
> > xfs_fileoff_t offset_fsb = XFS_B_TO_FSBT(mp, offset);
> > @@ -1773,6 +1775,7 @@ xfs_buffered_write_iomap_begin(
> > */
> > if (flags & IOMAP_ZERO) {
> > xfs_fileoff_t eof_fsb = XFS_B_TO_FSB(mp, XFS_ISIZE(ip));
> > + u64 end;
> >
> > if (isnullstartblock(imap.br_startblock) &&
> > offset_fsb >= eof_fsb)
> > @@ -1780,6 +1783,26 @@ xfs_buffered_write_iomap_begin(
> > if (offset_fsb < eof_fsb && end_fsb > eof_fsb)
> > end_fsb = eof_fsb;
> >
> > + /*
> > + * Look up dirty folios for unwritten mappings within EOF.
> > + * Providing this bypasses the flush iomap uses to trigger
> > + * extent conversion when unwritten mappings have dirty
> > + * pagecache in need of zeroing.
> > + *
> > + * Trim the mapping to the end pos of the lookup, which in turn
> > + * was trimmed to the end of the batch if it became full before
> > + * the end of the mapping.
> > + */
> > + if (imap.br_state == XFS_EXT_UNWRITTEN &&
> > + offset_fsb < eof_fsb) {
> > + loff_t len = min(count,
> > + XFS_FSB_TO_B(mp, imap.br_blockcount));
> > +
> > + end = iomap_fill_dirty_folios(iter, offset, len);
> > + end_fsb = min_t(xfs_fileoff_t, end_fsb,
> > + XFS_B_TO_FSB(mp, end));
> > + }
> > +
> > xfs_trim_extent(&imap, offset_fsb, end_fsb - offset_fsb);
> > }
> >
> > --
> > 2.51.0
> >
>
> Hi Brian Foster,
>
> Greetings!
>
> I used Syzkaller and found that there is possible deadlock in xfs_ilock in linux-next next-20251203.
>
> After bisection and the first bad commit is:
> "
> 77c475692c5e xfs: fill dirty folios on zero range of unwritten mappings
> "
>
The referenced reproducer doesn't throw anything for me, but if you want
to test the following:
https://lore.kernel.org/linux-fsdevel/20251113135404.553339-1-bfoster@redhat.com/
... that removes the allocation associated with this splat. Thanks.
Brian
> All detailed into can be found at:
> https://github.com/laifryiee/syzkaller_logs/tree/main/251204_221645_xfs_ilock
> Syzkaller repro code:
> https://github.com/laifryiee/syzkaller_logs/tree/main/251204_221645_xfs_ilock/repro.c
> Syzkaller repro syscall steps:
> https://github.com/laifryiee/syzkaller_logs/tree/main/251204_221645_xfs_ilock/repro.prog
> Syzkaller report:
> https://github.com/laifryiee/syzkaller_logs/tree/main/251204_221645_xfs_ilock/repro.report
> Kconfig(make olddefconfig):
> https://github.com/laifryiee/syzkaller_logs/tree/main/251204_221645_xfs_ilock/kconfig_origin
> Bisect info:
> https://github.com/laifryiee/syzkaller_logs/tree/main/251204_221645_xfs_ilock/bisect_info.log
> bzImage:
> https://github.com/laifryiee/syzkaller_logs/raw/refs/heads/main/251204_221645_xfs_ilock/bzImage_b2c27842ba853508b0da00187a7508eb3a96c8f7
> Issue dmesg:
> https://github.com/laifryiee/syzkaller_logs/blob/main/251204_221645_xfs_ilock/b2c27842ba853508b0da00187a7508eb3a96c8f7_dmesg.log
>
> "
> [ 21.088994] ======================================================
> [ 21.089362] WARNING: possible circular locking dependency detected
> [ 21.089726] 6.18.0-next-20251203-b2c27842ba85 #1 Not tainted
> [ 21.090060] ------------------------------------------------------
> [ 21.090417] kswapd0/58 is trying to acquire lock:
> [ 21.090697] ffff888028ff1f18 (&xfs_nondir_ilock_class){++++}-{4:4}, at: xfs_ilock+0x30f/0x390
> [ 21.091235]
> [ 21.091235] but task is already holding lock:
> [ 21.091575] ffffffff8784b580 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0xb7e/0x15c0
> [ 21.092058]
> [ 21.092058] which lock already depends on the new lock.
> [ 21.092058]
> [ 21.092524]
> [ 21.092524] the existing dependency chain (in reverse order) is:
> [ 21.092949]
> [ 21.092949] -> #1 (fs_reclaim){+.+.}-{0:0}:
> [ 21.093290] fs_reclaim_acquire+0x116/0x160
> [ 21.093579] __kmalloc_cache_noprof+0x53/0x7e0
> [ 21.093886] iomap_fill_dirty_folios+0x118/0x2c0
> [ 21.094204] xfs_buffered_write_iomap_begin+0xf18/0x2150
> [ 21.094552] iomap_iter+0x551/0xf40
> [ 21.094798] iomap_zero_range+0x20b/0xa90
> [ 21.095075] xfs_zero_range+0xb5/0x100
> [ 21.095335] xfs_reflink_remap_prep+0x3d3/0xa90
> [ 21.095643] xfs_file_remap_range+0x23c/0xdc0
> [ 21.095944] vfs_clone_file_range+0x2b1/0xda0
> [ 21.096243] ioctl_file_clone+0x6e/0x110
> [ 21.096521] do_vfs_ioctl+0xcab/0x14d0
> [ 21.096786] __x64_sys_ioctl+0x127/0x220
> [ 21.097057] x64_sys_call+0x1280/0x21b0
> [ 21.097331] do_syscall_64+0x6d/0x1180
> [ 21.097607] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [ 21.097936]
> [ 21.097936] -> #0 (&xfs_nondir_ilock_class){++++}-{4:4}:
> [ 21.098334] __lock_acquire+0x14d1/0x2210
> [ 21.098615] lock_acquire+0x170/0x2f0
> [ 21.098869] down_write_nested+0x9a/0x210
> [ 21.099145] xfs_ilock+0x30f/0x390
> [ 21.099385] xfs_icwalk_ag+0xaec/0x1b60
> [ 21.099652] xfs_icwalk+0x56/0xc0
> [ 21.099892] xfs_reclaim_inodes_nr+0x1d3/0x2d0
> [ 21.100192] xfs_fs_free_cached_objects+0x6a/0x90
> [ 21.100506] super_cache_scan+0x415/0x570
> [ 21.100794] do_shrink_slab+0x408/0x1030
> [ 21.101069] shrink_slab+0x348/0x12f0
> [ 21.101329] shrink_node+0xacc/0x2670
> [ 21.101587] balance_pgdat+0xa2d/0x15c0
> [ 21.101860] kswapd+0x5b9/0xab0
> [ 21.102093] kthread+0x464/0x980
> [ 21.102329] ret_from_fork+0x780/0x8f0
> [ 21.102596] ret_from_fork_asm+0x1a/0x30
> [ 21.102873]
> [ 21.102873] other info that might help us debug this:
> [ 21.102873]
> [ 21.103335] Possible unsafe locking scenario:
> [ 21.103335]
> [ 21.103683] CPU0 CPU1
> [ 21.103955] ---- ----
> [ 21.104225] lock(fs_reclaim);
> [ 21.104428] lock(&xfs_nondir_ilock_class);
> [ 21.104823] lock(fs_reclaim);
> [ 21.105158] lock(&xfs_nondir_ilock_class);
> [ 21.105416]
> [ 21.105416] *** DEADLOCK ***
> [ 21.105416]
> [ 21.105762] 2 locks held by kswapd0/58:
> [ 21.105993] #0: ffffffff8784b580 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0xb7e/0x15c0
> [ 21.106487] #1: ffff88800fd580e0 (&type->s_umount_key#53){.+.+}-{4:4}, at: super_cache_scan+0x9f/0x570
> [ 21.107047]
> [ 21.107047] stack backtrace:
> [ 21.107307] CPU: 1 UID: 0 PID: 58 Comm: kswapd0 Not tainted 6.18.0-next-20251203-b2c27842ba85 #1 PREEMPT(volu
> [ 21.107319] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.q4
> [ 21.107326] Call Trace:
> [ 21.107335] <TASK>
> [ 21.107338] dump_stack_lvl+0xea/0x150
> [ 21.107352] dump_stack+0x19/0x20
> [ 21.107359] print_circular_bug+0x283/0x350
> [ 21.107370] check_noncircular+0x12d/0x150
> [ 21.107383] __lock_acquire+0x14d1/0x2210
> [ 21.107398] lock_acquire+0x170/0x2f0
> [ 21.107407] ? xfs_ilock+0x30f/0x390
> [ 21.107420] ? __cond_resched+0x37/0x50
> [ 21.107434] down_write_nested+0x9a/0x210
> [ 21.107445] ? xfs_ilock+0x30f/0x390
> [ 21.107456] ? __pfx_down_write_nested+0x10/0x10
> [ 21.107468] ? xfs_icwalk_ag+0xadf/0x1b60
> [ 21.107482] ? xfs_icwalk_ag+0xaec/0x1b60
> [ 21.107497] ? xfs_icwalk_ag+0xaec/0x1b60
> [ 21.107510] xfs_ilock+0x30f/0x390
> [ 21.107523] xfs_icwalk_ag+0xaec/0x1b60
> [ 21.107542] ? __pfx_xfs_icwalk_ag+0x10/0x10
> [ 21.107561] ? __pfx_xa_find+0x10/0x10
> [ 21.107581] ? xfs_group_grab_next_mark+0x26a/0x520
> [ 21.107605] ? __this_cpu_preempt_check+0x21/0x30
> [ 21.107616] ? lock_release+0x14f/0x2a0
> [ 21.107628] ? xfs_group_grab_next_mark+0x274/0x520
> [ 21.107643] ? __pfx_xfs_group_grab_next_mark+0x10/0x10
> [ 21.107662] ? __pfx_try_to_wake_up+0x10/0x10
> [ 21.107678] ? lock_release+0x14f/0x2a0
> [ 21.107689] xfs_icwalk+0x56/0xc0
> [ 21.107704] xfs_reclaim_inodes_nr+0x1d3/0x2d0
> [ 21.107718] ? __pfx_xfs_reclaim_inodes_nr+0x10/0x10
> [ 21.107734] ? __this_cpu_preempt_check+0x21/0x30
> [ 21.107744] ? __pfx_prune_icache_sb+0x10/0x10
> [ 21.107762] xfs_fs_free_cached_objects+0x6a/0x90
> [ 21.107777] super_cache_scan+0x415/0x570
> [ 21.107794] do_shrink_slab+0x408/0x1030
> [ 21.107813] shrink_slab+0x348/0x12f0
> [ 21.107831] ? shrink_slab+0x160/0x12f0
> [ 21.107845] ? __pfx_shrink_slab+0x10/0x10
> [ 21.107866] shrink_node+0xacc/0x2670
> [ 21.107888] ? __pfx_shrink_node+0x10/0x10
> [ 21.107900] ? preempt_schedule_common+0x49/0xd0
> [ 21.107913] balance_pgdat+0xa2d/0x15c0
> [ 21.107929] ? __pfx_balance_pgdat+0x10/0x10
> [ 21.107941] ? rcu_watching_snap_stopped_since+0x20/0xf0
> [ 21.107975] kswapd+0x5b9/0xab0
> [ 21.107990] ? __pfx_kswapd+0x10/0x10
> [ 21.108002] ? _raw_spin_unlock_irqrestore+0x35/0x70
> [ 21.108017] ? trace_hardirqs_on+0x26/0x130
> [ 21.108040] ? __pfx_autoremove_wake_function+0x10/0x10
> [ 21.108060] ? __sanitizer_cov_trace_const_cmp1+0x1e/0x30
> [ 21.108080] ? __kthread_parkme+0x1bc/0x260
> [ 21.108094] ? __pfx_kswapd+0x10/0x10
> [ 21.108107] ? __pfx_kswapd+0x10/0x10
> [ 21.108120] kthread+0x464/0x980
> [ 21.108128] ? __pfx_kthread+0x10/0x10
> [ 21.108135] ? trace_hardirqs_on+0x26/0x130
> [ 21.108149] ? _raw_spin_unlock_irq+0x3c/0x60
> [ 21.108158] ? __pfx_kthread+0x10/0x10
> [ 21.108167] ret_from_fork+0x780/0x8f0
> [ 21.108177] ? __pfx_ret_from_fork+0x10/0x10
> [ 21.108186] ? native_load_tls+0x16/0x50
> [ 21.108199] ? __sanitizer_cov_trace_const_cmp8+0x1c/0x30
> [ 21.108213] ? __switch_to+0x823/0x10b0
> [ 21.108232] ? __pfx_kthread+0x10/0x10
> [ 21.108240] ret_from_fork_asm+0x1a/0x30
> [ 21.108257] </TASK>
> [ 21.592826] repro: page allocation failure: order:0, mode:0x10cc0(GFP_KERNEL|__GFP_NORETRY), nodemask=(null),0
> [ 21.593533] CPU: 1 UID: 0 PID: 727 Comm: repro Not tainted 6.18.0-next-20251203-b2c27842ba85 #1 PREEMPT(volun
> [ 21.593545] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.q4
> [ 21.593551] Call Trace:
> [ 21.593554] <TASK>
> [ 21.593557] dump_stack_lvl+0x121/0x150
> [ 21.593572] dump_stack+0x19/0x20
> [ 21.593582] warn_alloc+0x216/0x360
> [ 21.593595] ? __pfx_warn_alloc+0x10/0x10
> [ 21.593607] ? __pfx___alloc_pages_direct_compact+0x10/0x10
> [ 21.593618] ? __drain_all_pages+0x27d/0x480
> [ 21.593628] __alloc_pages_slowpath.constprop.0+0x1340/0x2230
> [ 21.593644] ? __pfx___alloc_pages_slowpath.constprop.0+0x10/0x10
> [ 21.593657] ? __might_sleep+0x108/0x160
> [ 21.593680] __alloc_frozen_pages_noprof+0x47f/0x550
> [ 21.593690] ? asm_sysvec_apic_timer_interrupt+0x1f/0x30
> [ 21.593702] ? __pfx___alloc_frozen_pages_noprof+0x10/0x10
> [ 21.593716] ? policy_nodemask+0xf9/0x450
> [ 21.593734] alloc_pages_mpol+0x236/0x4c0
> [ 21.593746] ? __pfx_alloc_pages_mpol+0x10/0x10
> [ 21.593758] ? alloc_frozen_pages_noprof+0x48/0x180
> [ 21.593766] ? alloc_frozen_pages_noprof+0x51/0x180
> [ 21.593775] alloc_frozen_pages_noprof+0xa9/0x180
> [ 21.593783] alloc_pages_noprof+0x27/0xa0
> [ 21.593791] kimage_alloc_pages+0x78/0x240
> [ 21.593809] kimage_alloc_control_pages+0x1ca/0xa60
> [ 21.593819] ? __pfx_kimage_alloc_control_pages+0x10/0x10
> [ 21.593827] ? __sanitizer_cov_trace_cmp8+0x1c/0x30
> [ 21.593844] do_kexec_load+0x39b/0x8c0
> [ 21.593851] ? __might_fault+0xf1/0x1b0
> [ 21.593868] ? __pfx_do_kexec_load+0x10/0x10
> [ 21.593876] ? __sanitizer_cov_trace_const_cmp8+0x1c/0x30
> [ 21.593887] ? _copy_from_user+0x75/0xa0
> [ 21.593904] __x64_sys_kexec_load+0x1cc/0x240
> [ 21.593913] x64_sys_call+0x1c90/0x21b0
> [ 21.593922] do_syscall_64+0x6d/0x1180
> [ 21.593930] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [ 21.593938] RIP: 0033:0x7f347b83ee5d
> [ 21.593952] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 88
> [ 21.593959] RSP: 002b:00007ffc6cb1d938 EFLAGS: 00000207 ORIG_RAX: 00000000000000f6
> [ 21.593972] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f347b83ee5d
> [ 21.593977] RDX: 0000200000000180 RSI: 0000000000000003 RDI: 0000000000000000
> [ 21.593982] RBP: 00007ffc6cb1d950 R08: 00007ffc6cb1d3c0 R09: 00007ffc6cb1d950
> [ 21.593986] R10: 0000000000000000 R11: 0000000000000207 R12: 00007ffc6cb1daa8
> [ 21.593991] R13: 00000000004030f5 R14: 000000000040ee08 R15: 00007f347bb26000
> [ 21.594000] </TASK>
> "
>
> Hope this cound be insightful to you.
>
> Regards,
> Yi Lai
>
> ---
>
> If you don't need the following environment to reproduce the problem or if you
> already have one reproduced environment, please ignore the following information.
>
> How to reproduce:
> git clone https://gitlab.com/xupengfe/repro_vm_env.git
> cd repro_vm_env
> tar -xvf repro_vm_env.tar.gz
> cd repro_vm_env; ./start3.sh // it needs qemu-system-x86_64 and I used v7.1.0
> // start3.sh will load bzImage_2241ab53cbb5cdb08a6b2d4688feb13971058f65 v6.2-rc5 kernel
> // You could change the bzImage_xxx as you want
> // Maybe you need to remove line "-drive if=pflash,format=raw,readonly=on,file=./OVMF_CODE.fd \" for different qemu version
> You could use below command to log in, there is no password for root.
> ssh -p 10023 root@localhost
>
> After login vm(virtual machine) successfully, you could transfer reproduced
> binary to the vm by below way, and reproduce the problem in vm:
> gcc -pthread -o repro repro.c
> scp -P 10023 repro root@localhost:/root/
>
> Get the bzImage for target kernel:
> Please use target kconfig and copy it to kernel_src/.config
> make olddefconfig
> make -jx bzImage //x should equal or less than cpu num your pc has
>
> Fill the bzImage file into above start3.sh to load the target kernel in vm.
>
>
> Tips:
> If you already have qemu-system-x86_64, please ignore below info.
> If you want to install qemu v7.1.0 version:
> git clone https://github.com/qemu/qemu.git
> cd qemu
> git checkout -f v7.1.0
> mkdir build
> cd build
> yum install -y ninja-build.x86_64
> yum -y install libslirp-devel.x86_64
> ../configure --target-list=x86_64-softmmu --enable-kvm --enable-vnc --enable-gtk --enable-sdl --enable-usb-redir --enable-slirp
> make
> make install
>
>
next prev parent reply other threads:[~2025-12-05 13:57 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-03 13:46 [PATCH v5 0/7] iomap: zero range folio batch support Brian Foster
2025-10-03 13:46 ` [PATCH v5 1/7] filemap: add helper to look up dirty folios in a range Brian Foster
2025-10-03 13:46 ` [PATCH v5 2/7] iomap: remove pos+len BUG_ON() to after folio lookup Brian Foster
2025-10-03 13:46 ` [PATCH v5 3/7] iomap: optional zero range dirty folio processing Brian Foster
2025-10-03 13:46 ` [PATCH v5 4/7] xfs: always trim mapping to requested range for zero range Brian Foster
2025-10-03 13:46 ` [PATCH v5 5/7] xfs: fill dirty folios on zero range of unwritten mappings Brian Foster
2025-12-05 3:01 ` Lai, Yi
2025-12-05 13:57 ` Brian Foster [this message]
2025-10-03 13:46 ` [PATCH v5 6/7] iomap: remove old partial eof zeroing optimization Brian Foster
2025-10-03 13:46 ` [PATCH v5 7/7] xfs: error tag to force zeroing on debug kernels Brian Foster
2025-10-07 11:12 ` [PATCH v5 0/7] iomap: zero range folio batch support Christian Brauner
2025-10-21 0:14 ` Joanne Koong
2025-10-29 12:32 ` Christian Brauner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aTLkuabg_fP49Gjv@bfoster \
--to=bfoster@redhat.com \
--cc=brauner@kernel.org \
--cc=djwong@kernel.org \
--cc=hch@infradead.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-xfs@vger.kernel.org \
--cc=syzkaller-bugs@googlegroups.com \
--cc=willy@infradead.org \
--cc=yi1.lai@intel.com \
--cc=yi1.lai@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox