linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH, RFC] limit per-inode writeback size considered harmful
@ 2025-10-13  7:21 Christoph Hellwig
  2025-10-13 11:01 ` Jan Kara
  0 siblings, 1 reply; 3+ messages in thread
From: Christoph Hellwig @ 2025-10-13  7:21 UTC (permalink / raw)
  To: jack, willy
  Cc: akpm, linux-fsdevel, linux-mm, dlemoal, linux-xfs, hans.holmberg

Hi all,

we have a customer workload where the current core writeback behavior
causes severe fragmentation on zoned XFS despite a friendly write pattern
from the application.  We tracked this down to writeback_chunk_size only
giving about 30-40MBs to each inode before switching to a new inode,
which will cause files that are aligned to the zone size (256MB on HDD)
to be fragmented into usually 5-7 extents spread over different zones.
Using the hack below makes this problem go away entirely by always
writing an inode fully up to the zone size.  Damien came up with a
heuristic here:

  https://lore.kernel.org/linux-xfs/20251013070945.GA2446@lst.de/T/#t

that also papers over this, but it falls apart on larger memory
systems where we can cache more of these files in the page cache
than we open zones.

Does anyone remember the reason for this limit writeback size?  I
looked at git history and the code touched comes from a refactoring in
2011, and before that it's really hard to figure out where the original
even worse behavior came from.   At least for zoned devices based
on a flag or something similar we'd love to avoid switching between
inodes during writeback, as that would drastically reduce the
potential for self-induced fragmentation.

---
 fs/fs-writeback.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 2b35e80037fe..9dd9c5f4d86b 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -1892,9 +1892,11 @@ static long writeback_chunk_size(struct bdi_writeback *wb,
 	 *                   (quickly) tag currently dirty pages
 	 *                   (maybe slowly) sync all tagged pages
 	 */
-	if (work->sync_mode == WB_SYNC_ALL || work->tagged_writepages)
+	if (1) { /* XXX: check flag */
+		pages = SZ_256M; /* Don't hard code? */
+	} else if (work->sync_mode == WB_SYNC_ALL || work->tagged_writepages) {
 		pages = LONG_MAX;
-	else {
+	} else {
 		pages = min(wb->avg_write_bandwidth / 2,
 			    global_wb_domain.dirty_limit / DIRTY_SCOPE);
 		pages = min(pages, work->nr_pages);
-- 
2.47.3



^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2025-10-13 21:16 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-10-13  7:21 [PATCH, RFC] limit per-inode writeback size considered harmful Christoph Hellwig
2025-10-13 11:01 ` Jan Kara
2025-10-13 21:16   ` Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox