From: Matthew Wilcox <willy@infradead.org>
To: Qu Wenruo <quwenruo.btrfs@gmx.com>
Cc: Linux Memory Management List <linux-mm@kvack.org>,
linux-f2fs-devel@lists.sourceforge.net,
Jens Axboe <axboe@kernel.dk>,
linux-fsdevel@vger.kernel.org
Subject: Re: Calling folio_end_writeback() inside a spinlock under task context?
Date: Mon, 3 Mar 2025 04:52:15 +0000 [thread overview]
Message-ID: <Z8U1f4g5av7L1Tv-@casper.infradead.org> (raw)
In-Reply-To: <14bd34c8-8fe0-4440-8dfd-e71223303edc@gmx.com>
Adding Jens to the cc. As you well know, he added this code, so I'm
mystified why you didn't cc him. Also adding linux-fsdevel (I presume
this was a mistake and you inadvertently cc'd f2fs-devel instead.)
On Mon, Mar 03, 2025 at 10:34:26AM +1030, Qu Wenruo wrote:
> [SPINLOCK AND END WRITEBACK]
>
> Although folio_end_writeback() can be called in an interruption context
> (by the in_task() check), surprisingly it may not be suitable to be
> called inside a spinlock (in task context):
It's poor practice to do that; you're introducing a dependency between
your fs lock and the i_mapping lock, which is already pretty intertwined
with ... every other lock in the system.
> For example the following call chain can lead to sleep:
>
> spin_lock()
> folio_end_writeback()
> |- folio_end_dropbehind_write()
> |- folio_launder()
> Which can sleep.
>
> My question is, can and should we allow folio_end_writeback() inside a
> spinlock?
>
> [BTRFS' ASYNC EXTENT PROBLEM]
>
> This is again a btrfs specific behavior, that we have to call
> folio_end_writeback() inside a spinlock to make sure really only one
> thread can clear the writeback flag of a folio.
>
> I know iomap is doing a pretty good job preventing early finished
> writeback to clear the folio writeback flag, meanwhile we're still
> submitting other blocks inside the folio.
>
> Iomap goes holding one extra writeback count before starting marking
> blocks writeback and submitting them.
> And after all blocks are submitted, reduce the writeback count (and call
> folio_end_writeback() if it's the last one holding the writeback flag).
>
> But the iomap solution requires that, all blocks inside a folio to be
> submitted at the same time.
I aactually don't like the iomap solution as it's currently written; it
just hasn't risen high enough up my todo list to fix it.
What we should do is initialise folio->ifs->write_bytes_pending to
bitmap_weight(ifs->state, blocks_per_folio) * block_size in
iomap_writepage_map() [this is oversimplified; we'd need to handle eof
and so on too]
That would address your problem as well as do fewer atomic operations.
> This is not true in btrfs, due to the design of btrfs' async extent,
> which can mark the blocks for writeback really at any timing, as we keep
> the lock of folios and queue them into a workqueue to do compression,
> then only after the compression is done, we submit and mark them
> writeback and unlock.
>
> If we do not hold a spinlock calling folio_end_writeback(), but only
> checks if we're the last holder and call folio_end_writeback() out of
> the spinlock, we can hit the following race where folio_end_writeback()
> can be called on the same folio:
>
> 0 32K 64K
> |<- AE 1 ->|<- AE 2 ->|
>
> Thread A (AE 1) | Thread B (AE 2)
> --------------------------------------+------------------------------
> submit_one_async_extent() |
> |- process_one_folio() |
> |- subpage_set_writeback() |
> |
> /* IO finished */ |
> end_compressed_writeback() |
> |- btrfs_folio_clear_writeback() |
> |- spin_lock() |
> | /* this is the last writeback |
> | holder, should end the |
> | folio writeback flag */ |
> |- last = true |
> |- spin_unlock() |
> | | submit_one_async_extent()
> | | |- process_one_folio()
> | | |- subpage_set_writeback()
This seems weird. Why are you setting the "subpage" writeback bit
while the folio writeback bit is still set? That smells like a
bug-in-waiting if not an actual bug to me. Surely it should wait on
the folio writeback bit to clear?
> | | /* IO finished */
> | | end_compressed_writeback()
> | | |btrfs_folio_clear_writeback()
> | | | /* Again the last holder */
> | | |- spin_lock()
> | | |- last = true
> | | |- spin_unlock()
> |- folio_end_writeback() | |- folio_end_writeback()
>
> I know the most proper solution would to get rid of the delayed unlock
> and submission, but mark blocks for writeback without the async extent
> mechanism completely, then follow the iomap's solution.
>
> But that will be a huge change (although we will go that path
> eventually), meanwhile the folio_end_writeback() inside spinlock needs
> to be fixed asap.
I'd suggest the asap solution is for btrfs to mark itself as not
supporting dropbehind.
> So my question is, can we allow folio_end_writeback() to be called
> inside a spinlock, at the cost of screwing up the folio reclaim for
> DONTCACHE?
>
> Thanks,
> Qu
>
next prev parent reply other threads:[~2025-03-03 4:52 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-03 0:04 Qu Wenruo
2025-03-03 4:52 ` Matthew Wilcox [this message]
2025-03-03 5:33 ` Qu Wenruo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z8U1f4g5av7L1Tv-@casper.infradead.org \
--to=willy@infradead.org \
--cc=axboe@kernel.dk \
--cc=linux-f2fs-devel@lists.sourceforge.net \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=quwenruo.btrfs@gmx.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox