Re: Calling folio_end_writeback() inside a spinlock under task context?

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Matthew Wilcox <willy@infradead.org>
To: Qu Wenruo <quwenruo.btrfs@gmx.com>
Cc: Linux Memory Management List <linux-mm@kvack.org>,
	linux-f2fs-devel@lists.sourceforge.net,
	Jens Axboe <axboe@kernel.dk>,
	linux-fsdevel@vger.kernel.org
Subject: Re: Calling folio_end_writeback() inside a spinlock under task context?
Date: Mon, 3 Mar 2025 04:52:15 +0000	[thread overview]
Message-ID: <Z8U1f4g5av7L1Tv-@casper.infradead.org> (raw)
In-Reply-To: <14bd34c8-8fe0-4440-8dfd-e71223303edc@gmx.com>

Adding Jens to the cc.  As you well know, he added this code, so I'm
mystified why you didn't cc him.  Also adding linux-fsdevel (I presume
this was a mistake and you inadvertently cc'd f2fs-devel instead.)

On Mon, Mar 03, 2025 at 10:34:26AM +1030, Qu Wenruo wrote:
> [SPINLOCK AND END WRITEBACK]
> 
> Although folio_end_writeback() can be called in an interruption context
> (by the in_task() check), surprisingly it may not be suitable to be
> called inside a spinlock (in task context):

It's poor practice to do that; you're introducing a dependency between
your fs lock and the i_mapping lock, which is already pretty intertwined
with ... every other lock in the system.

> For example the following call chain can lead to sleep:
> 
> spin_lock()
> folio_end_writeback()
> |- folio_end_dropbehind_write()
>    |- folio_launder()
>       Which can sleep.
> 
> My question is, can and should we allow folio_end_writeback() inside a
> spinlock?
> 
> [BTRFS' ASYNC EXTENT PROBLEM]
> 
> This is again a btrfs specific behavior, that we have to call
> folio_end_writeback() inside a spinlock to make sure really only one
> thread can clear the writeback flag of a folio.
> 
> I know iomap is doing a pretty good job preventing early finished
> writeback to clear the folio writeback flag, meanwhile we're still
> submitting other blocks inside the folio.
> 
> Iomap goes holding one extra writeback count before starting marking
> blocks writeback and submitting them.
> And after all blocks are submitted, reduce the writeback count (and call
> folio_end_writeback() if it's the last one holding the writeback flag).
> 
> But the iomap solution requires that, all blocks inside a folio to be
> submitted at the same time.

I aactually don't like the iomap solution as it's currently written; it
just hasn't risen high enough up my todo list to fix it.

What we should do is initialise folio->ifs->write_bytes_pending to
bitmap_weight(ifs->state, blocks_per_folio) * block_size in
iomap_writepage_map() [this is oversimplified; we'd need to handle eof
and so on too]

That would address your problem as well as do fewer atomic operations.

> This is not true in btrfs, due to the design of btrfs' async extent,
> which can mark the blocks for writeback really at any timing, as we keep
> the lock of folios and queue them into a workqueue to do compression,
> then only after the compression is done, we submit and mark them
> writeback and unlock.
> 
> If we do not hold a spinlock calling folio_end_writeback(), but only
> checks if we're the last holder and call folio_end_writeback() out of
> the spinlock, we can hit the following race where folio_end_writeback()
> can be called on the same folio:
> 
>      0          32K         64K
>      |<- AE 1 ->|<- AE 2 ->|
> 
>             Thread A (AE 1)           |            Thread B (AE 2)
> --------------------------------------+------------------------------
> submit_one_async_extent()             |
> |- process_one_folio()                |
>      |- subpage_set_writeback()       |
>                                       |
> /* IO finished */                     |
> end_compressed_writeback()            |
> |- btrfs_folio_clear_writeback()      |
>      |- spin_lock()                   |
>      |  /* this is the last writeback |
>      |     holder, should end the     |
>      |     folio writeback flag */    |
>      |- last = true                   |
>      |- spin_unlock()                 |
>      |                                | submit_one_async_extent()
>      |                                | |- process_one_folio()
>      |                                |    |- subpage_set_writeback()

This seems weird.  Why are you setting the "subpage" writeback bit
while the folio writeback bit is still set?  That smells like a
bug-in-waiting if not an actual bug to me.  Surely it should wait on
the folio writeback bit to clear?

>      |                                | /* IO finished */
>      |                                | end_compressed_writeback()
>      |                                | |btrfs_folio_clear_writeback()
>      |                                |    | /* Again the last holder */
>      |                                |    |- spin_lock()
>      |                                |    |- last = true
>      |                                |    |- spin_unlock()
>      |- folio_end_writeback()         |    |- folio_end_writeback()
> 
> I know the most proper solution would to get rid of the delayed unlock
> and submission, but mark blocks for writeback without the async extent
> mechanism completely, then follow the iomap's solution.
> 
> But that will be a huge change (although we will go that path
> eventually), meanwhile the folio_end_writeback() inside spinlock needs
> to be fixed asap.

I'd suggest the asap solution is for btrfs to mark itself as not
supporting dropbehind.

> So my question is, can we allow folio_end_writeback() to be called
> inside a spinlock, at the cost of screwing up the folio reclaim for
> DONTCACHE?
> 
> Thanks,
> Qu
>

next prev parent reply	other threads:[~2025-03-03  4:52 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-03  0:04 Qu Wenruo
2025-03-03  4:52 ` Matthew Wilcox [this message]
2025-03-03  5:33   ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z8U1f4g5av7L1Tv-@casper.infradead.org \
    --to=willy@infradead.org \
    --cc=axboe@kernel.dk \
    --cc=linux-f2fs-devel@lists.sourceforge.net \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=quwenruo.btrfs@gmx.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox