From: "Darrick J. Wong" <djwong@kernel.org>
To: Kundan Kumar <kundan.kumar@samsung.com>
Cc: viro@zeniv.linux.org.uk, brauner@kernel.org, jack@suse.cz,
willy@infradead.org, mcgrof@kernel.org, clm@meta.com,
david@fromorbit.com, amir73il@gmail.com, axboe@kernel.dk,
hch@lst.de, ritesh.list@gmail.com, dave@stgolabs.net,
cem@kernel.org, wangyufei@vivo.com,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
linux-xfs@vger.kernel.org, gost.dev@samsung.com,
anuj20.g@samsung.com, vishak.g@samsung.com, joshi.k@samsung.com
Subject: Re: [PATCH v3 6/6] xfs: offload writeback by AG using per-inode dirty bitmap and per-AG workers
Date: Thu, 29 Jan 2026 14:34:21 -0800 [thread overview]
Message-ID: <20260129223421.GE7712@frogsfrogsfrogs> (raw)
In-Reply-To: <20260116100818.7576-7-kundan.kumar@samsung.com>
On Fri, Jan 16, 2026 at 03:38:18PM +0530, Kundan Kumar wrote:
> Offload XFS writeback to per-AG workers based on the inode dirty-AG
> bitmap. Each worker scans and submits writeback only for folios
> belonging to its AG.
>
> Signed-off-by: Kundan Kumar <kundan.kumar@samsung.com>
> Signed-off-by: Anuj Gupta <anuj20.g@samsung.com>
> ---
> fs/xfs/xfs_aops.c | 178 ++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 178 insertions(+)
>
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 9d5b65922cd2..55c3154fb2b5 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -678,6 +678,180 @@ xfs_zoned_writeback_submit(
> return 0;
> }
>
> +static bool xfs_agp_match(struct xfs_inode *ip, pgoff_t index,
> + xfs_agnumber_t agno)
> +{
> + void *ent;
> + u32 v;
> + bool match = false;
> +
> + ent = xa_load(&ip->i_ag_pmap, index);
> + if (ent && xa_is_value(ent)) {
> + v = xa_to_value(ent);
> + if (xfs_agp_valid(v))
> + match = (xfs_agp_agno(v) == (u32)agno);
> + }
> +
> + return match;
> +}
> +
> +static bool xfs_folio_matches_ag(struct folio *folio, xfs_agnumber_t agno)
> +{
> + struct xfs_inode *ip = XFS_I(folio_mapping(folio)->host);
> +
> + return xfs_agp_match(ip, folio->index, agno);
> +}
> +
> +static int xfs_writepages_ag(struct xfs_inode *ip,
> + struct writeback_control *wbc,
> + xfs_agnumber_t agno)
> +{
> + struct inode *inode = VFS_I(ip);
> + struct address_space *mapping = inode->i_mapping;
> + struct folio_batch *fbatch = &wbc->fbatch;
> + int ret = 0;
> + pgoff_t index, end;
> +
> + wbc->range_cyclic = 0;
> +
> + folio_batch_init(fbatch);
> + index = wbc->range_start >> PAGE_SHIFT;
> + end = wbc->range_end >> PAGE_SHIFT;
> +
> + struct xfs_writepage_ctx wpc = {
> + .ctx = {
> + .inode = inode,
> + .wbc = wbc,
> + .ops = &xfs_writeback_ops,
> + },
> + };
> +
> + while (index <= end) {
> + int i, nr;
> +
> + /* get a batch of DIRTY folios starting at index */
> + nr = filemap_get_folios_tag(mapping, &index, end,
> + PAGECACHE_TAG_DIRTY, fbatch);
> + if (!nr)
> + break;
> +
> + for (i = 0; i < nr; i++) {
> + struct folio *folio = fbatch->folios[i];
> +
> + /* Filter BEFORE locking */
> + if (!xfs_folio_matches_ag(folio, agno))
So we grab a batch of dirty folios, and only /then/ check to see if
they've been tagged with the target agno? That doesn't seem very
efficient if there are a lot of dirty folio and they're not evenly
distributed among AGs.
> + continue;
> +
> + folio_lock(folio);
> +
> + /*
> + * Now it's ours: clear dirty and submit.
> + * This prevents *this AG worker* from seeing it again
> + * next time.
> + */
> + if (!folio_clear_dirty_for_io(folio)) {
> + folio_unlock(folio);
> + continue;
> + }
> + xa_erase(&ip->i_ag_pmap, folio->index);
Why erase the association? Is this because once we've written the folio
back to storage, we want a subsequent write to a fsblock within that
folio to tag the folio with the agnumber of that fsblock? Hrm, maybe
that's how you deal with multi-fsblock folios; the folio tag reflects
the first block to be dirtied within the folio?
> +
> + ret = iomap_writeback_folio(&wpc.ctx, folio);
> + folio_unlock(folio);
> +
> + if (ret) {
> + folio_batch_release(fbatch);
> + goto out;
> + }
> + }
> +
> + folio_batch_release(fbatch);
> + cond_resched();
> + }
> +
> +out:
> + if (wpc.ctx.wb_ctx && wpc.ctx.ops && wpc.ctx.ops->writeback_submit)
> + wpc.ctx.ops->writeback_submit(&wpc.ctx, ret);
> +
> + return ret;
> +}
> +
> +static void xfs_ag_writeback_work(struct work_struct *work)
> +{
> + struct xfs_ag_wb *awb = container_of(to_delayed_work(work),
> + struct xfs_ag_wb, ag_work);
> + struct xfs_ag_wb_task *task;
> + struct xfs_mount *mp;
> + struct inode *inode;
> + struct xfs_inode *ip;
> + int ret;
> +
> + for (;;) {
> + spin_lock(&awb->lock);
> + task = list_first_entry_or_null(&awb->task_list,
> + struct xfs_ag_wb_task, list);
> + if (task)
> + list_del_init(&task->list);
> + spin_unlock(&awb->lock);
> +
> + if (!task)
> + break;
> +
> + ip = task->ip;
> + mp = ip->i_mount;
> + inode = VFS_I(ip);
> +
> + ret = xfs_writepages_ag(ip, &task->wbc, task->agno);
> +
> + /* If didn't submit everything for this AG, set its bit */
> + if (ret)
> + set_bit(task->agno, ip->i_ag_dirty_bitmap);
> +
> + iput(inode); /* drop igrab */
> + mempool_free(task, mp->m_ag_task_pool);
> + }
> +}
> +
> +static int xfs_vm_writepages_offload(struct address_space *mapping,
> + struct writeback_control *wbc)
> +{
> + struct inode *inode = mapping->host;
> + struct xfs_inode *ip = XFS_I(inode);
> + struct xfs_mount *mp = ip->i_mount;
> + struct xfs_ag_wb *awb;
> + struct xfs_ag_wb_task *task;
> + xfs_agnumber_t agno;
> +
> + if (!ip->i_ag_dirty_bits)
> + return 0;
> +
> + for_each_set_bit(agno, ip->i_ag_dirty_bitmap, ip->i_ag_dirty_bits) {
> + if (!test_and_clear_bit(agno, ip->i_ag_dirty_bitmap))
> + continue;
> +
> + task = mempool_alloc(mp->m_ag_task_pool, GFP_NOFS);
Allocating memory (even from a mempool) during writeback makes me
nervous...
> + if (!task) {
> + set_bit(agno, ip->i_ag_dirty_bitmap);
> + continue;
> + }
...because apparently the allocation can fail. If so, then why don't we
just fall back to serial writeback instead of ... moving on to the next
AG and seeing if there's more memory?
> +
> + INIT_LIST_HEAD(&task->list);
> + task->ip = ip;
> + task->agno = agno;
> + task->wbc = *wbc;
> + igrab(inode); /* worker owns inode ref */
Shouldn't we check for a null return value here? That should never
happen, but we /do/ have the option of falling back to standard
iomap_writepages.
> +
> + awb = &mp->m_ag_wb[agno];
> +
> + spin_lock(&awb->lock);
> + list_add_tail(&task->list, &awb->task_list);
> + spin_unlock(&awb->lock);
> +
> + mod_delayed_work(mp->m_ag_wq, &awb->ag_work, 0);
> + }
> +
> + return 0;
> +}
> +
> static const struct iomap_writeback_ops xfs_zoned_writeback_ops = {
> .writeback_range = xfs_zoned_writeback_range,
> .writeback_submit = xfs_zoned_writeback_submit,
> @@ -706,6 +880,7 @@ xfs_init_ag_writeback(struct xfs_mount *mp)
> for (agno = 0; agno < mp->m_sb.sb_agcount; agno++) {
> struct xfs_ag_wb *awb = &mp->m_ag_wb[agno];
>
> + INIT_DELAYED_WORK(&awb->ag_work, xfs_ag_writeback_work);
> spin_lock_init(&awb->lock);
> INIT_LIST_HEAD(&awb->task_list);
> awb->agno = agno;
> @@ -769,6 +944,9 @@ xfs_vm_writepages(
> xfs_open_zone_put(xc.open_zone);
> return error;
> } else {
> + if (wbc->sync_mode != WB_SYNC_ALL)
> + return xfs_vm_writepages_offload(mapping, wbc);
> +
> struct xfs_writepage_ctx wpc = {
> .ctx = {
> .inode = mapping->host,
> --
> 2.25.1
>
>
next prev parent reply other threads:[~2026-01-29 22:34 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CGME20260116101236epcas5p12ba3de776976f4ea6666e16a33ab6ec4@epcas5p1.samsung.com>
2026-01-16 10:08 ` [PATCH v3 0/6] AG aware parallel writeback for XFS Kundan Kumar
[not found] ` <CGME20260116101241epcas5p330f9c335a096aaaefda4b7d3c38d6038@epcas5p3.samsung.com>
2026-01-16 10:08 ` [PATCH v3 1/6] iomap: add write ops hook to attach metadata to folios Kundan Kumar
[not found] ` <CGME20260116101245epcas5p30269c6aa35784db67e6d6ca800a683a7@epcas5p3.samsung.com>
2026-01-16 10:08 ` [PATCH v3 2/6] xfs: add helpers to pack AG prediction info for per-folio tracking Kundan Kumar
2026-01-29 0:45 ` Darrick J. Wong
2026-02-03 7:15 ` Kundan Kumar
2026-02-05 16:39 ` Darrick J. Wong
2026-02-04 7:37 ` Nirjhar Roy (IBM)
[not found] ` <CGME20260116101251epcas5p1cf5b48f2efb14fe4387be3053b3c3ebc@epcas5p1.samsung.com>
2026-01-16 10:08 ` [PATCH v3 3/6] xfs: add per-inode AG prediction map and dirty-AG bitmap Kundan Kumar
2026-01-29 0:44 ` Darrick J. Wong
2026-02-03 7:20 ` Kundan Kumar
2026-02-05 16:42 ` Darrick J. Wong
2026-02-05 6:44 ` Nirjhar Roy (IBM)
2026-02-05 16:32 ` Darrick J. Wong
2026-02-06 5:41 ` Nirjhar Roy (IBM)
2026-02-05 6:36 ` Nirjhar Roy (IBM)
2026-02-05 16:36 ` Darrick J. Wong
2026-02-06 5:36 ` Nirjhar Roy (IBM)
2026-02-06 5:57 ` Darrick J. Wong
2026-02-06 6:03 ` Nirjhar Roy (IBM)
2026-02-06 7:00 ` Christoph Hellwig
[not found] ` <CGME20260116101256epcas5p2d6125a6bcad78c33f737fdc3484aca79@epcas5p2.samsung.com>
2026-01-16 10:08 ` [PATCH v3 4/6] xfs: tag folios with AG number during buffered write via iomap attach hook Kundan Kumar
2026-01-29 0:47 ` Darrick J. Wong
2026-01-29 22:40 ` Darrick J. Wong
2026-02-03 7:32 ` Kundan Kumar
2026-02-03 7:28 ` Kundan Kumar
2026-02-05 15:56 ` Brian Foster
2026-02-06 6:44 ` Nirjhar Roy (IBM)
[not found] ` <CGME20260116101259epcas5p1cfa6ab02e5a01f7c46cc78df95c57ce0@epcas5p1.samsung.com>
2026-01-16 10:08 ` [PATCH v3 5/6] xfs: add per-AG writeback workqueue infrastructure Kundan Kumar
2026-01-29 22:21 ` Darrick J. Wong
2026-02-03 7:35 ` Kundan Kumar
2026-02-06 6:46 ` Christoph Hellwig
2026-02-10 11:56 ` Nirjhar Roy (IBM)
[not found] ` <CGME20260116101305epcas5p497cd6d9027301853669f1c1aaffbf128@epcas5p4.samsung.com>
2026-01-16 10:08 ` [PATCH v3 6/6] xfs: offload writeback by AG using per-inode dirty bitmap and per-AG workers Kundan Kumar
2026-01-29 22:34 ` Darrick J. Wong [this message]
2026-02-03 7:40 ` Kundan Kumar
2026-02-11 9:39 ` Nirjhar Roy (IBM)
2026-01-16 16:13 ` [syzbot ci] Re: AG aware parallel writeback for XFS syzbot ci
2026-01-21 19:54 ` [PATCH v3 0/6] " Brian Foster
2026-01-22 16:15 ` Kundan Kumar
2026-01-23 9:36 ` Pankaj Raghav (Samsung)
2026-01-23 13:26 ` Brian Foster
2026-01-28 18:28 ` Kundan Kumar
2026-02-06 6:25 ` Christoph Hellwig
2026-02-06 10:07 ` Kundan Kumar
2026-02-06 17:42 ` Darrick J. Wong
2026-02-09 6:30 ` Christoph Hellwig
2026-02-09 15:54 ` Kundan Kumar
2026-02-10 15:38 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260129223421.GE7712@frogsfrogsfrogs \
--to=djwong@kernel.org \
--cc=amir73il@gmail.com \
--cc=anuj20.g@samsung.com \
--cc=axboe@kernel.dk \
--cc=brauner@kernel.org \
--cc=cem@kernel.org \
--cc=clm@meta.com \
--cc=dave@stgolabs.net \
--cc=david@fromorbit.com \
--cc=gost.dev@samsung.com \
--cc=hch@lst.de \
--cc=jack@suse.cz \
--cc=joshi.k@samsung.com \
--cc=kundan.kumar@samsung.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-xfs@vger.kernel.org \
--cc=mcgrof@kernel.org \
--cc=ritesh.list@gmail.com \
--cc=viro@zeniv.linux.org.uk \
--cc=vishak.g@samsung.com \
--cc=wangyufei@vivo.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox