From: Mike Snitzer <snitzer@kernel.org>
To: Zorro Lang <zlang@redhat.com>
Cc: linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-xfs@vger.kernel.org
Subject: Re: [Bug report][xfstests generic/751] hang on nfs writeback
Date: Wed, 29 Oct 2025 16:50:13 -0400 [thread overview]
Message-ID: <aQJ-BfBblQ5hlgm1@kernel.org> (raw)
In-Reply-To: <aPkFLDsQ71BrTXt7@kernel.org>
On Wed, Oct 22, 2025 at 12:24:12PM -0400, Mike Snitzer wrote:
> On Tue, Oct 21, 2025 at 01:14:08PM +0800, Zorro Lang wrote:
> > Hi,
> >
> > When I did xfstests regression test on nfs, it hang on generic/751 many times,
> > refer to [1](on x86_64) and [2](on aarch64). I've hit it 8 times until now.
>
> You've hit it 8 times, all on the same kernel? Any idea if this is a
> regression specific to 6.18-rc1 and later?
>
> > I tested on latest linux v6.18-rc2. My underlying test device isn't loopback device,
> > it's general disk, and make xfs on it:
> > meta-data=/dev/sda5 isize=512 agcount=4, agsize=983040 blks
> > = sectsz=4096 attr=2, projid32bit=1
> > = crc=1 finobt=1, sparse=1, rmapbt=1
> > = reflink=1 bigtime=1 inobtcount=1 nrext64=1
> > = exchange=0 metadir=0
> > data = bsize=4096 blocks=3932160, imaxpct=25
> > = sunit=0 swidth=0 blks
> > naming =version 2 bsize=4096 ascii-ci=0, ftype=1, parent=0
> > log =internal log bsize=4096 blocks=54324, version=2
> > = sectsz=4096 sunit=1 blks, lazy-count=1
> > realtime =none extsz=4096 blocks=0, rtextents=0
> > = rgcount=0 rgsize=0 extents
> > = zoned=0 start=0 reserved=0
> >
> > Two xfs are mounted on /mnt/xfstests/test and /mnt/xfstests/scratch seperately,
> > then export as:
> > # cat /etc/exports
> > /mnt/xfstests/test/nfs-server *(rw,insecure,no_root_squash)
> > /mnt/xfstests/scratch/nfs-server *(rw,insecure,no_root_squash)
> >
> > The nfs mount option is only "-o vers=4.2". BTW, nfs server and client are
> > in same machine/system.
>
> Being that NFS client and server are on the same localhost, I see
> NFS's LOCALIO is being used (nfslocaliod mentioned in "locks held"
> section, so IO is bypassing going over the wire to NFSD and is instead
> being issued directly from NFS client through LOCALIO workqueue and
> then down to XFS).
>
> I'll try to make sense of the locking (reported below) but this will
> take some time.
>
> Cc'ing XFS list because this is NFS LOCALIO ontop of XFS, could be
> there is something fundamental in LOCALIO's nfslocaliod workqueue that
> is setting us up for deadlock if XFS takes a specific path...
>
> NOTE: fs/nfs/inode.c:nfsiod_start(), XFS's nature has influenced
> LOCALIO workqueue config hacks already.. could be dragons lurking:
>
> /*
> * Start the nfsiod workqueues
> */
> static int nfsiod_start(void)
> {
> dprintk("RPC: creating workqueue nfsiod\n");
> nfsiod_workqueue = alloc_workqueue("nfsiod", WQ_MEM_RECLAIM | WQ_UNBOUND, 0);
> if (nfsiod_workqueue == NULL)
> return -ENOMEM;
> #if IS_ENABLED(CONFIG_NFS_LOCALIO)
> /*
> * localio writes need to use a normal (non-memreclaim) workqueue.
> * When we start getting low on space, XFS goes and calls flush_work() on
> * a non-memreclaim work queue, which causes a priority inversion problem.
> */
> dprintk("RPC: creating workqueue nfslocaliod\n");
> nfslocaliod_workqueue = alloc_workqueue("nfslocaliod", WQ_UNBOUND, 0);
> if (unlikely(nfslocaliod_workqueue == NULL)) {
> nfsiod_stop();
> return -ENOMEM;
> }
> #endif /* CONFIG_NFS_LOCALIO */
> return 0;
> }
>
> Mike
>
>
> > # cat local.config
> > export FSTYP=nfs
> > export TEST_DEV=xxxx-xxx-xxxx.xxxx.xxx.xxx:/mnt/xfstests/test/nfs-server
> > export TEST_DIR=/mnt/xfstests/test/nfs-client
> > export SCRATCH_DEV=xxxx-xxx-xxxx.xxxx.xxx.xxx:/mnt/xfstests/scratch/nfs-server
> > export SCRATCH_MNT=/mnt/xfstests/scratch/nfs-client
> > export MOUNT_OPTIONS="-o vers=4.2"
> > export TEST_FS_MOUNT_OPTS="-o vers=4.2"
> >
> >
> > [1]
> > [23369.572660] run fstests generic/751 at 2025-10-20 23:15:24
> > [-- MARK -- Tue Oct 21 03:20:00 2025]
> > [-- MARK -- Tue Oct 21 03:25:00 2025]
> > [-- MARK -- Tue Oct 21 03:30:00 2025]
> > [-- MARK -- Tue Oct 21 03:35:00 2025]
> > [-- MARK -- Tue Oct 21 03:40:00 2025]
> > [25069.500900] INFO: task kworker/u9:73:825900 blocked for more than 122 seconds.
> > [25069.501484] Not tainted 6.18.0-rc2 #1
> > [25069.501822] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > [25069.502385] task:kworker/u9:73 state:D stack:24712 pid:825900 tgid:825900 ppid:2 task_flags:0x4208060 flags:0x00080000
> > [25069.503258] Workqueue: writeback wb_workfn (flush-0:48)
> > [25069.782725] Call Trace:
> > [25069.787191] <TASK>
> > [25069.787368] __schedule+0x838/0x1890
> > [25069.799533] ? __pfx___schedule+0x10/0x10
> > [25069.799852] ? __blk_flush_plug+0x27b/0x4d0
> > [25069.800617] ? find_held_lock+0x32/0x90
> > [25069.801292] ? __lock_release.isra.0+0x1a4/0x2c0
> > [25069.801639] schedule+0xd4/0x260
> > [25069.801902] io_schedule+0x8f/0xf0
> > [25069.802162] folio_wait_bit_common+0x2d9/0x780
> > [25069.802919] ? folio_wait_bit_common+0x1dd/0x780
> > [25069.803324] ? __pfx_folio_wait_bit_common+0x10/0x10
> > [25069.803684] ? nfs_page_clear_headlock+0x31/0x80 [nfs]
> > [25069.804254] ? __pfx_wake_page_function+0x10/0x10
> > [25069.804602] ? __pfx___might_resched+0x10/0x10
> > [25069.805160] writeback_get_folio+0x3f9/0x500
> > [25069.806221] writeback_iter+0x136/0x720
> > [25069.806628] nfs_writepages+0x4f8/0x9b0 [nfs]
> > [25069.807100] ? mark_held_locks+0x40/0x70
> > [25069.807403] ? __pfx_nfs_writepages+0x10/0x10 [nfs]
> > [25069.807846] ? virtqueue_notify+0x68/0xc0
> > [25069.808346] ? virtio_queue_rq+0x2b1/0x650 [virtio_blk]
> > [25069.808925] ? __lock_acquire+0x57c/0xbd0
> > [25069.809295] do_writepages+0x21f/0x560
> > [25069.809576] ? __pfx_do_writepages+0x10/0x10
> > [25069.809953] ? rcu_is_watching+0x15/0xb0
> > [25069.810587] __writeback_single_inode+0xe2/0x5f0
> > [25069.810982] ? __lock_release.isra.0+0x1a4/0x2c0
> > [25069.811373] ? __pfx___writeback_single_inode+0x10/0x10
> > [25069.811815] ? writeback_sb_inodes+0x416/0xd00
> > [25069.812194] writeback_sb_inodes+0x535/0xd00
> > [25069.812533] ? __pfx_stack_trace_save+0x10/0x10
> > [25069.814193] ? local_clock_noinstr+0xd/0xe0
> > [25069.814506] ? __pfx_writeback_sb_inodes+0x10/0x10
> > [25069.814910] ? __lock_release.isra.0+0x1a4/0x2c0
> > [25069.815407] ? lock_acquire+0x10b/0x150
> > [25069.815703] ? down_read_trylock+0x4b/0x60
> > [25069.816077] __writeback_inodes_wb+0xf4/0x270
> > [25069.816441] ? __pfx___writeback_inodes_wb+0x10/0x10
> > [25069.816864] ? queue_io+0x329/0x510
> > [25069.817148] wb_writeback+0x70a/0x9c0
> > [25069.817448] ? __pfx_wb_writeback+0x10/0x10
> > [25069.817831] ? get_nr_dirty_inodes+0xcb/0x180
> > [25069.818264] wb_do_writeback+0x5d4/0x8e0
> > [25069.818575] ? __pfx_wb_do_writeback+0x10/0x10
> > [25069.818975] ? set_worker_desc+0x16e/0x190
> > [25069.819313] ? __pfx_set_worker_desc+0x10/0x10
> > [25069.819669] wb_workfn+0x7c/0x200
> > [25069.819997] process_one_work+0xd8b/0x1320
> > [25069.820342] ? __pfx_process_one_work+0x10/0x10
> > [25069.820694] ? assign_work+0x16c/0x240
> > [25069.821053] worker_thread+0x5f3/0xfe0
> > [25069.821364] ? __pfx_worker_thread+0x10/0x10
> > [25069.821697] kthread+0x3b4/0x770
> > [25069.822122] ? kvm_sched_clock_read+0x11/0x20
> > [25069.822457] ? local_clock_noinstr+0xd/0xe0
> > [25069.822790] ? __pfx_kthread+0x10/0x10
> > [25069.823079] ? __lock_release.isra.0+0x1a4/0x2c0
> > [25069.823430] ? rcu_is_watching+0x15/0xb0
> > [25069.823729] ? __pfx_kthread+0x10/0x10
> > [25069.824035] ret_from_fork+0x393/0x480
> > [25069.864794] ? __pfx_kthread+0x10/0x10
> > [25069.865076] ? __pfx_kthread+0x10/0x10
> > [25069.865371] ret_from_fork_asm+0x1a/0x30
> > [25069.879905] </TASK>
> > [25069.880145]
> > [25069.880145] Showing all locks held in the system:
> > [25069.889550] 1 lock held by khungtaskd/37:
> > [25069.889886] #0: ffffffffa29309e0 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire.constprop.0+0x7/0x30
> > [25069.890603] 2 locks held by 751/825733:
> > [25069.890942] #0: ffff888101e36440 (sb_writers#16){.+.+}-{0:0}, at: ksys_write+0xf9/0x1d0
> > [25069.891589] #1: ffffffffa2b97fd0 (split_debug_mutex){+.+.}-{4:4}, at: split_huge_pages_write+0x124/0x430
> > [25069.892449] 3 locks held by kworker/u9:73/825900:
> > [25069.892818] #0: ffff8881029df958 ((wq_completion)writeback){+.+.}-{0:0}, at: process_one_work+0x7f5/0x1320
> > [25069.893530] #1: ffffc9000ad9fd10 ((work_completion)(&(&wb->dwork)->work)){+.+.}-{0:0}, at: process_one_work+0xd3f/0x1320
> > [25069.894333] #2: ffff88810b1480e8 (&type->s_umount_key#68){++++}-{4:4}, at: super_trylock_shared+0x1c/0xa0
> > [25069.895056] 2 locks held by kworker/u10:2/826054:
> > [25069.895418] #0: ffff888103ec2158 ((wq_completion)nfsiod){+.+.}-{0:0}, at: process_one_work+0x7f5/0x1320
> > [25069.896123] #1: ffffc90001e1fd10 ((work_completion)(&ctx->work)){+.+.}-{0:0}, at: process_one_work+0xd3f/0x1320
> > [25069.896952]
> > [25069.897085] =============================================
I was able to test xfstests generic/751 on a server in the Red Hat
lab, but I used a different baseline kernel so far (its 6.12.53-based
but has all of NFSD, NFS and SUNRPC changes from 6.18-rc1 backported).
The test runs without deadlock but there is one lockdep splat due to
missing this commit:
commit 45f69d091bab64a332fe751da9829dcd136348fd
Author: Long Li <leo.lilong@huawei.com>
Date: Sat Jun 22 16:26:31 2024 +0800
xfs: eliminate lockdep false positives in xfs_attr_shortform_list
xfs_attr_shortform_list() only called from a non-transactional context, it
hold ilock before alloc memory and maybe trapped in memory reclaim. Since
commit 204fae32d5f7("xfs: clean up remaining GFP_NOFS users") removed
GFP_NOFS flag, lockdep warning will be report as [1]. Eliminate lockdep
false positives by use __GFP_NOLOCKDEP to alloc memory
in xfs_attr_shortform_list().
[1] https://lore.kernel.org/linux-xfs/000000000000e33add0616358204@google.com/
Reported-by: syzbot+4248e91deb3db78358a2@syzkaller.appspotmail.com
Signed-off-by: Long Li <leo.lilong@huawei.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
After applying that commit, a generic/751 retest comes up clean. I'll
now switch to testing latest v6.18-rcX kernel.
Mike
prev parent reply other threads:[~2025-10-29 20:50 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-21 5:14 Zorro Lang
2025-10-22 16:24 ` Mike Snitzer
2025-10-29 20:50 ` Mike Snitzer [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aQJ-BfBblQ5hlgm1@kernel.org \
--to=snitzer@kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-nfs@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=zlang@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox