From: Joanne Koong <joannelkoong@gmail.com>
To: Jan Kara <jack@suse.cz>
Cc: linux-mm@kvack.org, brauner@kernel.org, willy@infradead.org,
hch@infradead.org, djwong@kernel.org, jlayton@kernel.org,
linux-fsdevel@vger.kernel.org, kernel-team@meta.com
Subject: Re: [PATCH v2 00/12] mm/iomap: add granular dirty and writeback accounting
Date: Thu, 4 Sep 2025 16:59:16 -0700 [thread overview]
Message-ID: <CAJnrk1acU+bs2uJTSjTgeGYCMavSpokGbrngoTE-3Hay4KZsWw@mail.gmail.com> (raw)
In-Reply-To: <5qgjrq6l627byybxjs6vzouspeqj6hdrx2ohqbxqkkjy65mtz5@zp6pimrpeu4e>
On Thu, Sep 4, 2025 at 1:53 AM Jan Kara <jack@suse.cz> wrote:
>
> Hello!
>
> On Fri 29-08-25 16:39:30, Joanne Koong wrote:
> > This patchset adds granular dirty and writeback stats accounting for large
> > folios.
> >
> > The dirty page balancing logic uses these stats to determine things like
> > whether the ratelimit has been exceeded, the frequency with which pages need
> > to be written back, if dirtying should be throttled, etc. Currently for large
> > folios, if any byte in the folio is dirtied or written back, all the bytes in
> > the folio are accounted as such.
> >
> > In particular, there are four places where dirty and writeback stats get
> > incremented and decremented as pages get dirtied and written back:
> > a) folio dirtying (filemap_dirty_folio() -> ... -> folio_account_dirtied())
> > - increments NR_FILE_DIRTY, NR_ZONE_WRITE_PENDING, WB_RECLAIMABLE,
> > current->nr_dirtied
> >
> > b) writing back a mapping (writeback_iter() -> ... ->
> > folio_clear_dirty_for_io())
> > - decrements NR_FILE_DIRTY, NR_ZONE_WRITE_PENDING, WB_RECLAIMABLE
> >
> > c) starting writeback on a folio (folio_start_writeback())
> > - increments WB_WRITEBACK, NR_WRITEBACK, NR_ZONE_WRITE_PENDING
> >
> > d) ending writeback on a folio (folio_end_writeback())
> > - decrements WB_WRITEBACK, NR_WRITEBACK, NR_ZONE_WRITE_PENDING
>
> I was looking through the patch set. One general concern I have is that it
> all looks somewhat fragile. If you say start writeback on a folio with a
> granular function and happen to end writeback with a non-granular one,
> everything will run fine, just a permanent error in the counters will be
> introduced. Similarly with a dirtying / starting writeback mismatch. The
> practicality of this issue is demostrated by the fact that you didn't
> convert e.g. folio_redirty_for_writepage() so anybody using it together
> with fine-grained accounting will just silently mess up the counters.
> Another issue of a similar kind is that __folio_migrate_mapping() does not
> support fine-grained accounting (and doesn't even have a way to figure out
> proper amount to account) so again any page migration may introduce
> permanent errors into counters. One way to deal with this fragility would
> be to have a flag in the mapping that will determine whether the dirty
> accounting is done by MM or the filesystem (iomap code in your case)
> instead of determining it at the call site.
>
> Another concern I have is the limitation to blocksize >= PAGE_SIZE you
> mention below. That is kind of annoying for filesystems because generally
> they also have to deal with cases of blocksize < PAGE_SIZE and having two
> ways of accounting in one codebase is a big maintenance burden. But this
> was discussed elsewhere in this series and I think you have settled on
> supporting blocksize < PAGE_SIZE as well?
>
> Finally, there is one general issue for which I'd like to hear opinions of
> MM guys: Dirty throttling is a mechanism to avoid a situation where the
> dirty page cache consumes too big amount of memory which makes page reclaim
> hard and the machine thrashes as a result or goes OOM. Now if you dirty a
> 2MB folio, it really makes all those 2MB hard to reclaim (neither direct
> reclaim nor kswapd will be able to reclaim such folio) even though only 1KB
> in that folio needs actual writeback. In this sense it is actually correct
> to account whole big folio as dirty in the counters - if you accounted only
> 1KB or even 4KB (page), a user could with some effort make all page cache
> memory dirty and hard to reclaim without crossing the dirty limits. On the
> other hand if only 1KB in a folio trully needs writeback, the writeback
> will be generally significantly faster than with 2MB needing writeback. So
> in this sense it is correct to account amount to data that trully needs
> writeback.
>
> I don't know what the right answer to this "conflict of interests" is. We
> could keep accounting full folios in the global / memcg counters (to
> protect memory reclaim) and do per page (or even finer) accounting in the
> bdi_writeback which is there to avoid excessive accumulation of dirty data
> (and thus long writeback times) against one device. This should still help
> your case with FUSE and strictlimit (which is generally constrained by
> bdi_writeback counters). One just needs to have a closer look how hard
> would it be to adapt writeback throttling logic to the different
> granularity of global counters and writeback counters...
>
> Honza
Hi Honza,
Thanks for sharing your thoughts on this. Those are good points,
especially the last one about reclaim. I'm curious to hear too what
the mm people think.
If it turns out this patchset is not actually that useful, I'm happy
to drop it.
Thanks,
Joanne
prev parent reply other threads:[~2025-09-04 23:59 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-29 23:39 Joanne Koong
2025-08-29 23:39 ` [PATCH v2 01/12] mm: pass number of pages to __folio_start_writeback() Joanne Koong
2025-09-03 11:48 ` David Hildenbrand
2025-09-03 20:02 ` Darrick J. Wong
2025-09-03 20:05 ` David Hildenbrand
2025-09-03 23:12 ` Joanne Koong
2025-08-29 23:39 ` [PATCH v2 02/12] mm: pass number of pages to __folio_end_writeback() Joanne Koong
2025-08-29 23:39 ` [PATCH v2 03/12] mm: add folio_end_writeback_pages() helper Joanne Koong
2025-08-29 23:39 ` [PATCH v2 04/12] mm: pass number of pages dirtied to __folio_mark_dirty() Joanne Koong
2025-08-29 23:39 ` [PATCH v2 05/12] mm: add filemap_dirty_folio_pages() helper Joanne Koong
2025-08-29 23:39 ` [PATCH v2 06/12] mm: add __folio_clear_dirty_for_io() helper Joanne Koong
2025-08-29 23:39 ` [PATCH v2 07/12] mm: add no_stats_accounting bitfield to wbc Joanne Koong
2025-08-29 23:39 ` [PATCH v2 08/12] mm: refactor clearing dirty stats into helper function Joanne Koong
2025-08-29 23:39 ` [PATCH v2 09/12] mm: add clear_dirty_for_io_stats() helper Joanne Koong
2025-08-29 23:39 ` [PATCH v2 10/12] iomap: refactor dirty bitmap iteration Joanne Koong
2025-09-03 18:53 ` Brian Foster
2025-09-03 19:59 ` Darrick J. Wong
2025-10-03 22:27 ` Joanne Koong
2025-10-04 1:11 ` Joanne Koong
2025-08-29 23:39 ` [PATCH v2 11/12] iomap: refactor uptodate " Joanne Koong
2025-08-29 23:39 ` [PATCH v2 12/12] iomap: add granular dirty and writeback accounting Joanne Koong
2025-09-02 23:46 ` Darrick J. Wong
2025-09-03 18:48 ` Brian Foster
2025-09-04 0:35 ` Joanne Koong
2025-09-04 2:52 ` Darrick J. Wong
2025-09-04 11:47 ` Brian Foster
2025-09-04 20:07 ` Darrick J. Wong
2025-09-05 0:14 ` Joanne Koong
2025-09-05 11:19 ` Brian Foster
2025-09-05 12:43 ` Jan Kara
2025-09-05 23:30 ` Joanne Koong
2025-09-04 8:53 ` [PATCH v2 00/12] mm/iomap: " Jan Kara
2025-09-04 23:59 ` Joanne Koong [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAJnrk1acU+bs2uJTSjTgeGYCMavSpokGbrngoTE-3Hay4KZsWw@mail.gmail.com \
--to=joannelkoong@gmail.com \
--cc=brauner@kernel.org \
--cc=djwong@kernel.org \
--cc=hch@infradead.org \
--cc=jack@suse.cz \
--cc=jlayton@kernel.org \
--cc=kernel-team@meta.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox