From: Christoph Hellwig <hch@lst.de>
To: jack@suse.cz, willy@infradead.org
Cc: akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org, dlemoal@kernel.org,
linux-xfs@vger.kernel.org, hans.holmberg@wdc.com
Subject: [PATCH, RFC] limit per-inode writeback size considered harmful
Date: Mon, 13 Oct 2025 16:21:42 +0900 [thread overview]
Message-ID: <20251013072738.4125498-1-hch@lst.de> (raw)
Hi all,
we have a customer workload where the current core writeback behavior
causes severe fragmentation on zoned XFS despite a friendly write pattern
from the application. We tracked this down to writeback_chunk_size only
giving about 30-40MBs to each inode before switching to a new inode,
which will cause files that are aligned to the zone size (256MB on HDD)
to be fragmented into usually 5-7 extents spread over different zones.
Using the hack below makes this problem go away entirely by always
writing an inode fully up to the zone size. Damien came up with a
heuristic here:
https://lore.kernel.org/linux-xfs/20251013070945.GA2446@lst.de/T/#t
that also papers over this, but it falls apart on larger memory
systems where we can cache more of these files in the page cache
than we open zones.
Does anyone remember the reason for this limit writeback size? I
looked at git history and the code touched comes from a refactoring in
2011, and before that it's really hard to figure out where the original
even worse behavior came from. At least for zoned devices based
on a flag or something similar we'd love to avoid switching between
inodes during writeback, as that would drastically reduce the
potential for self-induced fragmentation.
---
fs/fs-writeback.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 2b35e80037fe..9dd9c5f4d86b 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -1892,9 +1892,11 @@ static long writeback_chunk_size(struct bdi_writeback *wb,
* (quickly) tag currently dirty pages
* (maybe slowly) sync all tagged pages
*/
- if (work->sync_mode == WB_SYNC_ALL || work->tagged_writepages)
+ if (1) { /* XXX: check flag */
+ pages = SZ_256M; /* Don't hard code? */
+ } else if (work->sync_mode == WB_SYNC_ALL || work->tagged_writepages) {
pages = LONG_MAX;
- else {
+ } else {
pages = min(wb->avg_write_bandwidth / 2,
global_wb_domain.dirty_limit / DIRTY_SCOPE);
pages = min(pages, work->nr_pages);
--
2.47.3
next reply other threads:[~2025-10-13 7:27 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-13 7:21 Christoph Hellwig [this message]
2025-10-13 11:01 ` Jan Kara
2025-10-13 21:16 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251013072738.4125498-1-hch@lst.de \
--to=hch@lst.de \
--cc=akpm@linux-foundation.org \
--cc=dlemoal@kernel.org \
--cc=hans.holmberg@wdc.com \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-xfs@vger.kernel.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox