linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: Kundan Kumar <kundan.kumar@samsung.com>
Cc: viro@zeniv.linux.org.uk, brauner@kernel.org, jack@suse.cz,
	willy@infradead.org, mcgrof@kernel.org, clm@meta.com,
	david@fromorbit.com, amir73il@gmail.com, axboe@kernel.dk,
	hch@lst.de, ritesh.list@gmail.com, dave@stgolabs.net,
	cem@kernel.org, wangyufei@vivo.com,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	linux-xfs@vger.kernel.org, gost.dev@samsung.com,
	anuj20.g@samsung.com, vishak.g@samsung.com, joshi.k@samsung.com
Subject: Re: [PATCH v3 3/6] xfs: add per-inode AG prediction map and dirty-AG bitmap
Date: Thu, 5 Feb 2026 08:42:38 -0800	[thread overview]
Message-ID: <20260205164238.GS7712@frogsfrogsfrogs> (raw)
In-Reply-To: <2c485586-83c9-4697-91fc-7b0cee697704@samsung.com>

On Tue, Feb 03, 2026 at 12:50:53PM +0530, Kundan Kumar wrote:
> On 1/29/2026 6:14 AM, Darrick J. Wong wrote:
> > On Fri, Jan 16, 2026 at 03:38:15PM +0530, Kundan Kumar wrote:
> >> Add per-inode structures to track predicted AGs of dirty folios using
> >> an xarray and bitmap. This enables efficient identification of AGs
> >> involved in writeback.
> >>
> >> Signed-off-by: Kundan Kumar <kundan.kumar@samsung.com>
> >> Signed-off-by: Anuj Gupta <anuj20.g@samsung.com>
> >> ---
> >>   fs/xfs/xfs_icache.c | 27 +++++++++++++++++++++++++++
> >>   fs/xfs/xfs_inode.h  |  5 +++++
> >>   2 files changed, 32 insertions(+)
> >>
> >> diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> >> index e44040206851..f97aa6d66271 100644
> >> --- a/fs/xfs/xfs_icache.c
> >> +++ b/fs/xfs/xfs_icache.c
> >> @@ -80,6 +80,25 @@ static inline xa_mark_t ici_tag_to_mark(unsigned int tag)
> >>   	return XFS_PERAG_BLOCKGC_MARK;
> >>   }
> >>   
> >> +static int xfs_inode_init_ag_bitmap(struct xfs_inode *ip)
> >> +{
> >> +	unsigned int bits = ip->i_mount->m_sb.sb_agcount;
> >> +	unsigned int nlongs;
> >> +
> >> +	xa_init_flags(&ip->i_ag_pmap, XA_FLAGS_LOCK_IRQ);
> > 
> > This increases the size of struct xfs_inode by 40 bytes...
> > 
> 
> I’ll make this lazy and sparse: move AG writeback state behind a pointer
> allocated on first use, and replace the bitmap with a sparse dirty-AG
> set(xarray keyed by agno) so memory scales with AGs actually touched by
> the inode.
> 
> >> +	ip->i_ag_dirty_bitmap = NULL;
> >> +	ip->i_ag_dirty_bits = bits;
> >> +
> >> +	if (!bits)
> >> +		return 0;
> >> +
> >> +	nlongs = BITS_TO_LONGS(bits);
> >> +	ip->i_ag_dirty_bitmap = kcalloc(nlongs, sizeof(unsigned long),
> >> +					GFP_NOFS);
> > 
> > ...and there could be hundreds or thousands of AGs for each filesystem.
> > That's a lot of kernel memory to handle this prediction stuff, and I"m
> > not even sure what ag_dirty_bitmap does yet.
> > 
> 
> The bit for an AG is set in ag_dirty_bitmap at write time. During
> writeback, we check which AG bits are set, wake only those AG-specific
> workers, and each worker scans the page cache, filters folios tagged for
> its AG, and submits the I/O.
> 
> >> +
> >> +	return ip->i_ag_dirty_bitmap ? 0 : -ENOMEM;
> >> +}
> >> +
> >>   /*
> >>    * Allocate and initialise an xfs_inode.
> >>    */
> >> @@ -131,6 +150,8 @@ xfs_inode_alloc(
> >>   	ip->i_next_unlinked = NULLAGINO;
> >>   	ip->i_prev_unlinked = 0;
> >>   
> >> +	xfs_inode_init_ag_bitmap(ip);
> > 
> > Unchecked return value???
> 
> Will correct in next version
> 
> > 
> >> +
> >>   	return ip;
> >>   }
> >>   
> >> @@ -194,6 +215,12 @@ xfs_inode_free(
> >>   	ip->i_ino = 0;
> >>   	spin_unlock(&ip->i_flags_lock);
> >>   
> >> +	/* free xarray contents (values are immediate packed ints) */
> >> +	xa_destroy(&ip->i_ag_pmap);
> >> +	kfree(ip->i_ag_dirty_bitmap);
> >> +	ip->i_ag_dirty_bitmap = NULL;
> >> +	ip->i_ag_dirty_bits = 0;
> >> +
> >>   	__xfs_inode_free(ip);
> >>   }
> >>   
> >> diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
> >> index bd6d33557194..dee449168605 100644
> >> --- a/fs/xfs/xfs_inode.h
> >> +++ b/fs/xfs/xfs_inode.h
> >> @@ -99,6 +99,11 @@ typedef struct xfs_inode {
> >>   	spinlock_t		i_ioend_lock;
> >>   	struct work_struct	i_ioend_work;
> >>   	struct list_head	i_ioend_list;
> >> +
> >> +	/* AG prediction map: pgoff_t -> packed u32 */
> > 
> > What about blocksize < pagesize filesystems?  Which packed agno do you
> > associate with the pgoff_t?
> > 
> > Also, do you have an xarray entry for each pgoff_t in a large folio?
> > 
> > --D
> > 
> 
> pgoff_t here is the pagecache index (folio->index), i.e. file offset in
> PAGE_SIZE units, not a filesystem block index. So blocksize < PAGE_SIZE
> doesn’t change the association, the packed agno is attached to the folio
> at that pagecache index.

Ok, so the tag is entirely determined by the AG of the first fsblock
within the folio.

> We store one xarray entry per folio index (the start of the folio). We 
> do not create entries for each base-page inside a large folio. If a 
> large folio could span multiple extents/AGs, we’ll treat the hint as 
> advisory and tag it invalid (fallback to normal writeback routing) 
> rather than trying to encode per-subpage AGs.

Oh, ok, so if you have the mapping and the folio at the same time you
can determine that the entire large folio maps to a single extent, and
tag the whole large folio as belonging to a single AG.  That clears
things up, thank you.

It's only in the case of extreme fragmentation that a large folio gets
flung at the old writeback paths, which is probably good enough anyway.

--D

> >> +	struct xarray           i_ag_pmap;
> >> +	unsigned long           *i_ag_dirty_bitmap;
> >> +	unsigned int            i_ag_dirty_bits;
> >>   } xfs_inode_t;
> >>   
> >>   static inline bool xfs_inode_on_unlinked_list(const struct xfs_inode *ip)
> >> -- 
> >> 2.25.1
> >>
> >>
> > 
> 
> 


  reply	other threads:[~2026-02-05 16:42 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20260116101236epcas5p12ba3de776976f4ea6666e16a33ab6ec4@epcas5p1.samsung.com>
2026-01-16 10:08 ` [PATCH v3 0/6] AG aware parallel writeback for XFS Kundan Kumar
     [not found]   ` <CGME20260116101241epcas5p330f9c335a096aaaefda4b7d3c38d6038@epcas5p3.samsung.com>
2026-01-16 10:08     ` [PATCH v3 1/6] iomap: add write ops hook to attach metadata to folios Kundan Kumar
     [not found]   ` <CGME20260116101245epcas5p30269c6aa35784db67e6d6ca800a683a7@epcas5p3.samsung.com>
2026-01-16 10:08     ` [PATCH v3 2/6] xfs: add helpers to pack AG prediction info for per-folio tracking Kundan Kumar
2026-01-29  0:45       ` Darrick J. Wong
2026-02-03  7:15         ` Kundan Kumar
2026-02-05 16:39           ` Darrick J. Wong
2026-02-04  7:37       ` Nirjhar Roy (IBM)
     [not found]   ` <CGME20260116101251epcas5p1cf5b48f2efb14fe4387be3053b3c3ebc@epcas5p1.samsung.com>
2026-01-16 10:08     ` [PATCH v3 3/6] xfs: add per-inode AG prediction map and dirty-AG bitmap Kundan Kumar
2026-01-29  0:44       ` Darrick J. Wong
2026-02-03  7:20         ` Kundan Kumar
2026-02-05 16:42           ` Darrick J. Wong [this message]
2026-02-05  6:44         ` Nirjhar Roy (IBM)
2026-02-05 16:32           ` Darrick J. Wong
2026-02-06  5:41             ` Nirjhar Roy (IBM)
2026-02-05  6:36       ` Nirjhar Roy (IBM)
2026-02-05 16:36         ` Darrick J. Wong
2026-02-06  5:36           ` Nirjhar Roy (IBM)
2026-02-06  5:57             ` Darrick J. Wong
2026-02-06  6:03               ` Nirjhar Roy (IBM)
2026-02-06  7:00       ` Christoph Hellwig
     [not found]   ` <CGME20260116101256epcas5p2d6125a6bcad78c33f737fdc3484aca79@epcas5p2.samsung.com>
2026-01-16 10:08     ` [PATCH v3 4/6] xfs: tag folios with AG number during buffered write via iomap attach hook Kundan Kumar
2026-01-29  0:47       ` Darrick J. Wong
2026-01-29 22:40         ` Darrick J. Wong
2026-02-03  7:32           ` Kundan Kumar
2026-02-03  7:28         ` Kundan Kumar
2026-02-05 15:56           ` Brian Foster
2026-02-06  6:44       ` Nirjhar Roy (IBM)
     [not found]   ` <CGME20260116101259epcas5p1cfa6ab02e5a01f7c46cc78df95c57ce0@epcas5p1.samsung.com>
2026-01-16 10:08     ` [PATCH v3 5/6] xfs: add per-AG writeback workqueue infrastructure Kundan Kumar
2026-01-29 22:21       ` Darrick J. Wong
2026-02-03  7:35         ` Kundan Kumar
2026-02-06  6:46       ` Christoph Hellwig
2026-02-10 11:56       ` Nirjhar Roy (IBM)
     [not found]   ` <CGME20260116101305epcas5p497cd6d9027301853669f1c1aaffbf128@epcas5p4.samsung.com>
2026-01-16 10:08     ` [PATCH v3 6/6] xfs: offload writeback by AG using per-inode dirty bitmap and per-AG workers Kundan Kumar
2026-01-29 22:34       ` Darrick J. Wong
2026-02-03  7:40         ` Kundan Kumar
2026-02-11  9:39       ` Nirjhar Roy (IBM)
2026-01-16 16:13   ` [syzbot ci] Re: AG aware parallel writeback for XFS syzbot ci
2026-01-21 19:54   ` [PATCH v3 0/6] " Brian Foster
2026-01-22 16:15     ` Kundan Kumar
2026-01-23  9:36       ` Pankaj Raghav (Samsung)
2026-01-23 13:26       ` Brian Foster
2026-01-28 18:28         ` Kundan Kumar
2026-02-06  6:25           ` Christoph Hellwig
2026-02-06 10:07             ` Kundan Kumar
2026-02-06 17:42               ` Darrick J. Wong
2026-02-09  6:30               ` Christoph Hellwig
2026-02-09 15:54             ` Kundan Kumar
2026-02-10 15:38               ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260205164238.GS7712@frogsfrogsfrogs \
    --to=djwong@kernel.org \
    --cc=amir73il@gmail.com \
    --cc=anuj20.g@samsung.com \
    --cc=axboe@kernel.dk \
    --cc=brauner@kernel.org \
    --cc=cem@kernel.org \
    --cc=clm@meta.com \
    --cc=dave@stgolabs.net \
    --cc=david@fromorbit.com \
    --cc=gost.dev@samsung.com \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=joshi.k@samsung.com \
    --cc=kundan.kumar@samsung.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=mcgrof@kernel.org \
    --cc=ritesh.list@gmail.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=vishak.g@samsung.com \
    --cc=wangyufei@vivo.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox