Re: __folio_end_writeback() lockdep issue

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Re: __folio_end_writeback() lockdep issue
       [not found] <9b845a47-9aee-43dd-99bc-1a82bea00442@bsbernd.com>
@ 2026-01-10 15:31 ` Bernd Schubert
  2026-01-10 16:30   ` Matthew Wilcox
  0 siblings, 1 reply; 3+ messages in thread
From: Bernd Schubert @ 2026-01-10 15:31 UTC (permalink / raw)
  To: Joanne Koong
  Cc: Miklos Szeredi, Horst Birthelmer, linux-fsdevel,
	Matthew Wilcox (Oracle),
	Andrew Morton, linux-mm, David Hildenbrand (Red Hat)



On 1/10/26 15:05, Bernd Schubert wrote:
> Hi Joanne,
> 
> I run in lockdep issues on testing 6.19. And I think it is due to
> holding fi->lock in fuse_writepage_end() until fuse_writepage_finish() is
> complete
> 
> Proposed patch is
> 
> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> index 01bc894e9c2b..b2cd270c75d8 100644
> --- a/fs/fuse/file.c
> +++ b/fs/fuse/file.c
> @@ -2000,8 +2000,8 @@ static void fuse_writepage_end(struct fuse_mount *fm, struct fuse_args *args,
>                 fuse_invalidate_attr_mask(inode, FUSE_STATX_MODIFY);
>         spin_lock(&fi->lock);
>         fi->writectr--;
> -       fuse_writepage_finish(wpa);
>         spin_unlock(&fi->lock);
> +       fuse_writepage_finish(wpa);
>         fuse_writepage_free(wpa);
>  }
>  
> 
> But then there is this comment in fuse_writepage_finish
> 
> 		/*
> 		 * Benchmarks showed that ending writeback within the
> 		 * scope of the fi->lock alleviates xarray lock
> 		 * contention and noticeably improves performance.
> 		 */
> 
> 


Hmm, actually the critical part is

[  872.499480]  Possible interrupt unsafe locking scenario:
[  872.499480] 
[  872.500326]        CPU0                    CPU1
[  872.500906]        ----                    ----
[  872.501464]   lock(&p->sequence);
[  872.501923]                                local_irq_disable();
[  872.502615]                                lock(&xa->xa_lock#4);
[  872.503327]                                lock(&p->sequence);
[  872.504116]   <Interrupt>
[  872.504513]     lock(&xa->xa_lock#4);


Which is introduced by commit 2841808f35ee for all file systems. 
The should be rather generic - I shouldn't be the only one seeing
it?

So this?

mm: fix HARDIRQ-safe -> HARDIRQ-unsafe lock order in __folio_end_writeback()

__wb_writeout_add() is called while holding xa_lock (HARDIRQ-safe),
but it eventually calls fprop_fraction_percpu() which acquires
p->sequence (HARDIRQ-unsafe via seqcount), creating a lock ordering
violation.

Call trace:
  __folio_end_writeback()
    xa_lock_irqsave(&mapping->i_pages)     <- HARDIRQ-safe
      __wb_writeout_add()
        wb_domain_writeout_add()
          __fprop_add_percpu_max()
            fprop_fraction_percpu()
              read_seqcount_begin(&p->sequence)  <- HARDIRQ-unsafe

Possible deadlock scenario:

       CPU0                    CPU1
       ----                    ----
  lock(p->sequence)
                               local_irq_disable()
                               lock(xa_lock)
                               lock(p->sequence)
  <hardirq>
    lock(xa_lock)

                   *** DEADLOCK ***

Fix by moving __wb_writeout_add() outside the xa_lock critical section.
It only accesses percpu counters and global writeback domain structures,
none of which require xa_lock protection.

Fixes: 2841808f35ee ("mm: remove BDI_CAP_WRITEBACK_ACCT")
Signed-off-by: Bernd Schubert <bschubert@ddn.com>

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index ccdeb0e84d39..ab83e3cbbf94 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2994,7 +2994,6 @@ bool __folio_end_writeback(struct folio *folio)
 
                wb = inode_to_wb(inode);
                wb_stat_mod(wb, WB_WRITEBACK, -nr);
-               __wb_writeout_add(wb, nr);
                if (!mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK)) {
                        wb_inode_writeback_end(wb);
                        if (mapping->host)
@@ -3002,6 +3001,7 @@ bool __folio_end_writeback(struct folio *folio)
                }
 
                xa_unlock_irqrestore(&mapping->i_pages, flags);
+               __wb_writeout_add(wb, nr);
        } else {
                ret = folio_xor_flags_has_waiters(folio, 1 << PG_writeback);
        }



Thanks,
Bernd





^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: __folio_end_writeback() lockdep issue
  2026-01-10 15:31 ` __folio_end_writeback() lockdep issue Bernd Schubert
@ 2026-01-10 16:30   ` Matthew Wilcox
  2026-01-10 20:24     ` Bernd Schubert
  0 siblings, 1 reply; 3+ messages in thread
From: Matthew Wilcox @ 2026-01-10 16:30 UTC (permalink / raw)
  To: Bernd Schubert, Jan Kara
  Cc: Joanne Koong, Miklos Szeredi, Horst Birthelmer, linux-fsdevel,
	Andrew Morton, linux-mm, David Hildenbrand (Red Hat)

On Sat, Jan 10, 2026 at 04:31:28PM +0100, Bernd Schubert wrote:
> [  872.499480]  Possible interrupt unsafe locking scenario:
> [  872.499480] 
> [  872.500326]        CPU0                    CPU1
> [  872.500906]        ----                    ----
> [  872.501464]   lock(&p->sequence);
> [  872.501923]                                local_irq_disable();
> [  872.502615]                                lock(&xa->xa_lock#4);
> [  872.503327]                                lock(&p->sequence);
> [  872.504116]   <Interrupt>
> [  872.504513]     lock(&xa->xa_lock#4);
> 
> 
> Which is introduced by commit 2841808f35ee for all file systems. 
> The should be rather generic - I shouldn't be the only one seeing
> it?

Oh wow, 2841808f35ee has a very confusing commit message.  It implies
that _no_ filesystem uses BDI_CAP_WRITEBACK_ACCT, but what it really
means is that no filesystem now _clears_ BDI_CAP_WRITEBACK_ACCT, so
all filesystems do use this code path and therefore the flag can be
removed.  And that matches the code change.

So you should be able to reproduce this problem with commit 494d2f508883
as well?

That tells me that this is something fuse-specific.  Other filesystems
aren't seeing this.  Wonder why ...

__wb_writeout_add() or its predecessor __wb_writeout_inc() have been in
that spot since 2015 or earlier.  

The sequence lock itself is taken inside fprop_new_period() called from
writeout_period() which has been there since 2012, so that's not it.

Looking at fprop_new_period() is more interesting.  Commit a91befde3503
removed an earlier call to local_irq_save().  It was then replaced with
preempt_disable() in 9458e0a78c45 but maybe removing it was just
erroneous?

Anyway, that was 2022, so it doesn't answer "why is this only showing up
now and only for fuse?"  But maybe replacing the preempt-disable with
irq-disable in fprop_new_period() is the right solution, regardless.

> So this?
> 
> mm: fix HARDIRQ-safe -> HARDIRQ-unsafe lock order in __folio_end_writeback()
> 
> __wb_writeout_add() is called while holding xa_lock (HARDIRQ-safe),
> but it eventually calls fprop_fraction_percpu() which acquires
> p->sequence (HARDIRQ-unsafe via seqcount), creating a lock ordering
> violation.
> 
> Call trace:
>   __folio_end_writeback()
>     xa_lock_irqsave(&mapping->i_pages)     <- HARDIRQ-safe
>       __wb_writeout_add()
>         wb_domain_writeout_add()
>           __fprop_add_percpu_max()
>             fprop_fraction_percpu()
>               read_seqcount_begin(&p->sequence)  <- HARDIRQ-unsafe
> 
> Possible deadlock scenario:
> 
>        CPU0                    CPU1
>        ----                    ----
>   lock(p->sequence)
>                                local_irq_disable()
>                                lock(xa_lock)
>                                lock(p->sequence)
>   <hardirq>
>     lock(xa_lock)
> 
>                    *** DEADLOCK ***
> 
> Fix by moving __wb_writeout_add() outside the xa_lock critical section.
> It only accesses percpu counters and global writeback domain structures,
> none of which require xa_lock protection.
> 
> Fixes: 2841808f35ee ("mm: remove BDI_CAP_WRITEBACK_ACCT")
> Signed-off-by: Bernd Schubert <bschubert@ddn.com>
> 
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index ccdeb0e84d39..ab83e3cbbf94 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -2994,7 +2994,6 @@ bool __folio_end_writeback(struct folio *folio)
>  
>                 wb = inode_to_wb(inode);
>                 wb_stat_mod(wb, WB_WRITEBACK, -nr);
> -               __wb_writeout_add(wb, nr);
>                 if (!mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK)) {
>                         wb_inode_writeback_end(wb);
>                         if (mapping->host)
> @@ -3002,6 +3001,7 @@ bool __folio_end_writeback(struct folio *folio)
>                 }
>  
>                 xa_unlock_irqrestore(&mapping->i_pages, flags);
> +               __wb_writeout_add(wb, nr);
>         } else {
>                 ret = folio_xor_flags_has_waiters(folio, 1 << PG_writeback);
>         }
> 
> 
> 
> Thanks,
> Bernd
> 
> 
> 


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: __folio_end_writeback() lockdep issue
  2026-01-10 16:30   ` Matthew Wilcox
@ 2026-01-10 20:24     ` Bernd Schubert
  0 siblings, 0 replies; 3+ messages in thread
From: Bernd Schubert @ 2026-01-10 20:24 UTC (permalink / raw)
  To: Matthew Wilcox, Jan Kara
  Cc: Joanne Koong, Miklos Szeredi, Horst Birthelmer, linux-fsdevel,
	Andrew Morton, linux-mm, David Hildenbrand (Red Hat)



On 1/10/26 17:30, Matthew Wilcox wrote:
> On Sat, Jan 10, 2026 at 04:31:28PM +0100, Bernd Schubert wrote:
>> [  872.499480]  Possible interrupt unsafe locking scenario:
>> [  872.499480] 
>> [  872.500326]        CPU0                    CPU1
>> [  872.500906]        ----                    ----
>> [  872.501464]   lock(&p->sequence);
>> [  872.501923]                                local_irq_disable();
>> [  872.502615]                                lock(&xa->xa_lock#4);
>> [  872.503327]                                lock(&p->sequence);
>> [  872.504116]   <Interrupt>
>> [  872.504513]     lock(&xa->xa_lock#4);
>>
>>
>> Which is introduced by commit 2841808f35ee for all file systems. 
>> The should be rather generic - I shouldn't be the only one seeing
>> it?
> 
> Oh wow, 2841808f35ee has a very confusing commit message.  It implies
> that _no_ filesystem uses BDI_CAP_WRITEBACK_ACCT, but what it really
> means is that no filesystem now _clears_ BDI_CAP_WRITEBACK_ACCT, so
> all filesystems do use this code path and therefore the flag can be
> removed.  And that matches the code change.

Ah right, I had mixed it up, fuse was actually clearing
BDI_CAP_WRITEBACK_ACCT in the past.

> 
> So you should be able to reproduce this problem with commit 494d2f508883
> as well?

Yep, reproducible.

> 
> That tells me that this is something fuse-specific.  Other filesystems
> aren't seeing this.  Wonder why ...
> 
> __wb_writeout_add() or its predecessor __wb_writeout_inc() have been in
> that spot since 2015 or earlier.  
> 
> The sequence lock itself is taken inside fprop_new_period() called from
> writeout_period() which has been there since 2012, so that's not it.
> 
> Looking at fprop_new_period() is more interesting.  Commit a91befde3503
> removed an earlier call to local_irq_save().  It was then replaced with
> preempt_disable() in 9458e0a78c45 but maybe removing it was just
> erroneous?
> 
> Anyway, that was 2022, so it doesn't answer "why is this only showing up
> now and only for fuse?"  But maybe replacing the preempt-disable with
> irq-disable in fprop_new_period() is the right solution, regardless.

With fuse tmp pages mapping was NULL in past? I.e. I *guess* the trigger
is 0c58a97f919c ("fuse: remove tmp folio for writebacks and internal rb
tree"), although I'm confused why I didn't run into this earlier.


Bernd



^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-01-10 20:24 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <9b845a47-9aee-43dd-99bc-1a82bea00442@bsbernd.com>
2026-01-10 15:31 ` __folio_end_writeback() lockdep issue Bernd Schubert
2026-01-10 16:30   ` Matthew Wilcox
2026-01-10 20:24     ` Bernd Schubert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox