Re: __folio_end_writeback() lockdep issue

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Re: __folio_end_writeback() lockdep issue
       [not found] <9b845a47-9aee-43dd-99bc-1a82bea00442@bsbernd.com>
@ 2026-01-10 15:31 ` Bernd Schubert
  2026-01-10 16:30   ` Matthew Wilcox
  0 siblings, 1 reply; 8+ messages in thread
From: Bernd Schubert @ 2026-01-10 15:31 UTC (permalink / raw)
  To: Joanne Koong
  Cc: Miklos Szeredi, Horst Birthelmer, linux-fsdevel,
	Matthew Wilcox (Oracle),
	Andrew Morton, linux-mm, David Hildenbrand (Red Hat)



On 1/10/26 15:05, Bernd Schubert wrote:
> Hi Joanne,
> 
> I run in lockdep issues on testing 6.19. And I think it is due to
> holding fi->lock in fuse_writepage_end() until fuse_writepage_finish() is
> complete
> 
> Proposed patch is
> 
> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> index 01bc894e9c2b..b2cd270c75d8 100644
> --- a/fs/fuse/file.c
> +++ b/fs/fuse/file.c
> @@ -2000,8 +2000,8 @@ static void fuse_writepage_end(struct fuse_mount *fm, struct fuse_args *args,
>                 fuse_invalidate_attr_mask(inode, FUSE_STATX_MODIFY);
>         spin_lock(&fi->lock);
>         fi->writectr--;
> -       fuse_writepage_finish(wpa);
>         spin_unlock(&fi->lock);
> +       fuse_writepage_finish(wpa);
>         fuse_writepage_free(wpa);
>  }
>  
> 
> But then there is this comment in fuse_writepage_finish
> 
> 		/*
> 		 * Benchmarks showed that ending writeback within the
> 		 * scope of the fi->lock alleviates xarray lock
> 		 * contention and noticeably improves performance.
> 		 */
> 
> 


Hmm, actually the critical part is

[  872.499480]  Possible interrupt unsafe locking scenario:
[  872.499480] 
[  872.500326]        CPU0                    CPU1
[  872.500906]        ----                    ----
[  872.501464]   lock(&p->sequence);
[  872.501923]                                local_irq_disable();
[  872.502615]                                lock(&xa->xa_lock#4);
[  872.503327]                                lock(&p->sequence);
[  872.504116]   <Interrupt>
[  872.504513]     lock(&xa->xa_lock#4);


Which is introduced by commit 2841808f35ee for all file systems. 
The should be rather generic - I shouldn't be the only one seeing
it?

So this?

mm: fix HARDIRQ-safe -> HARDIRQ-unsafe lock order in __folio_end_writeback()

__wb_writeout_add() is called while holding xa_lock (HARDIRQ-safe),
but it eventually calls fprop_fraction_percpu() which acquires
p->sequence (HARDIRQ-unsafe via seqcount), creating a lock ordering
violation.

Call trace:
  __folio_end_writeback()
    xa_lock_irqsave(&mapping->i_pages)     <- HARDIRQ-safe
      __wb_writeout_add()
        wb_domain_writeout_add()
          __fprop_add_percpu_max()
            fprop_fraction_percpu()
              read_seqcount_begin(&p->sequence)  <- HARDIRQ-unsafe

Possible deadlock scenario:

       CPU0                    CPU1
       ----                    ----
  lock(p->sequence)
                               local_irq_disable()
                               lock(xa_lock)
                               lock(p->sequence)
  <hardirq>
    lock(xa_lock)

                   *** DEADLOCK ***

Fix by moving __wb_writeout_add() outside the xa_lock critical section.
It only accesses percpu counters and global writeback domain structures,
none of which require xa_lock protection.

Fixes: 2841808f35ee ("mm: remove BDI_CAP_WRITEBACK_ACCT")
Signed-off-by: Bernd Schubert <bschubert@ddn.com>

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index ccdeb0e84d39..ab83e3cbbf94 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2994,7 +2994,6 @@ bool __folio_end_writeback(struct folio *folio)
 
                wb = inode_to_wb(inode);
                wb_stat_mod(wb, WB_WRITEBACK, -nr);
-               __wb_writeout_add(wb, nr);
                if (!mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK)) {
                        wb_inode_writeback_end(wb);
                        if (mapping->host)
@@ -3002,6 +3001,7 @@ bool __folio_end_writeback(struct folio *folio)
                }
 
                xa_unlock_irqrestore(&mapping->i_pages, flags);
+               __wb_writeout_add(wb, nr);
        } else {
                ret = folio_xor_flags_has_waiters(folio, 1 << PG_writeback);
        }



Thanks,
Bernd





^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: __folio_end_writeback() lockdep issue
  2026-01-10 15:31 ` __folio_end_writeback() lockdep issue Bernd Schubert
@ 2026-01-10 16:30   ` Matthew Wilcox
  2026-01-10 20:24     ` Bernd Schubert
  2026-01-12 13:06     ` Jan Kara
  0 siblings, 2 replies; 8+ messages in thread
From: Matthew Wilcox @ 2026-01-10 16:30 UTC (permalink / raw)
  To: Bernd Schubert, Jan Kara
  Cc: Joanne Koong, Miklos Szeredi, Horst Birthelmer, linux-fsdevel,
	Andrew Morton, linux-mm, David Hildenbrand (Red Hat)

On Sat, Jan 10, 2026 at 04:31:28PM +0100, Bernd Schubert wrote:
> [  872.499480]  Possible interrupt unsafe locking scenario:
> [  872.499480] 
> [  872.500326]        CPU0                    CPU1
> [  872.500906]        ----                    ----
> [  872.501464]   lock(&p->sequence);
> [  872.501923]                                local_irq_disable();
> [  872.502615]                                lock(&xa->xa_lock#4);
> [  872.503327]                                lock(&p->sequence);
> [  872.504116]   <Interrupt>
> [  872.504513]     lock(&xa->xa_lock#4);
> 
> 
> Which is introduced by commit 2841808f35ee for all file systems. 
> The should be rather generic - I shouldn't be the only one seeing
> it?

Oh wow, 2841808f35ee has a very confusing commit message.  It implies
that _no_ filesystem uses BDI_CAP_WRITEBACK_ACCT, but what it really
means is that no filesystem now _clears_ BDI_CAP_WRITEBACK_ACCT, so
all filesystems do use this code path and therefore the flag can be
removed.  And that matches the code change.

So you should be able to reproduce this problem with commit 494d2f508883
as well?

That tells me that this is something fuse-specific.  Other filesystems
aren't seeing this.  Wonder why ...

__wb_writeout_add() or its predecessor __wb_writeout_inc() have been in
that spot since 2015 or earlier.  

The sequence lock itself is taken inside fprop_new_period() called from
writeout_period() which has been there since 2012, so that's not it.

Looking at fprop_new_period() is more interesting.  Commit a91befde3503
removed an earlier call to local_irq_save().  It was then replaced with
preempt_disable() in 9458e0a78c45 but maybe removing it was just
erroneous?

Anyway, that was 2022, so it doesn't answer "why is this only showing up
now and only for fuse?"  But maybe replacing the preempt-disable with
irq-disable in fprop_new_period() is the right solution, regardless.

> So this?
> 
> mm: fix HARDIRQ-safe -> HARDIRQ-unsafe lock order in __folio_end_writeback()
> 
> __wb_writeout_add() is called while holding xa_lock (HARDIRQ-safe),
> but it eventually calls fprop_fraction_percpu() which acquires
> p->sequence (HARDIRQ-unsafe via seqcount), creating a lock ordering
> violation.
> 
> Call trace:
>   __folio_end_writeback()
>     xa_lock_irqsave(&mapping->i_pages)     <- HARDIRQ-safe
>       __wb_writeout_add()
>         wb_domain_writeout_add()
>           __fprop_add_percpu_max()
>             fprop_fraction_percpu()
>               read_seqcount_begin(&p->sequence)  <- HARDIRQ-unsafe
> 
> Possible deadlock scenario:
> 
>        CPU0                    CPU1
>        ----                    ----
>   lock(p->sequence)
>                                local_irq_disable()
>                                lock(xa_lock)
>                                lock(p->sequence)
>   <hardirq>
>     lock(xa_lock)
> 
>                    *** DEADLOCK ***
> 
> Fix by moving __wb_writeout_add() outside the xa_lock critical section.
> It only accesses percpu counters and global writeback domain structures,
> none of which require xa_lock protection.
> 
> Fixes: 2841808f35ee ("mm: remove BDI_CAP_WRITEBACK_ACCT")
> Signed-off-by: Bernd Schubert <bschubert@ddn.com>
> 
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index ccdeb0e84d39..ab83e3cbbf94 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -2994,7 +2994,6 @@ bool __folio_end_writeback(struct folio *folio)
>  
>                 wb = inode_to_wb(inode);
>                 wb_stat_mod(wb, WB_WRITEBACK, -nr);
> -               __wb_writeout_add(wb, nr);
>                 if (!mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK)) {
>                         wb_inode_writeback_end(wb);
>                         if (mapping->host)
> @@ -3002,6 +3001,7 @@ bool __folio_end_writeback(struct folio *folio)
>                 }
>  
>                 xa_unlock_irqrestore(&mapping->i_pages, flags);
> +               __wb_writeout_add(wb, nr);
>         } else {
>                 ret = folio_xor_flags_has_waiters(folio, 1 << PG_writeback);
>         }
> 
> 
> 
> Thanks,
> Bernd
> 
> 
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: __folio_end_writeback() lockdep issue
  2026-01-10 16:30   ` Matthew Wilcox
@ 2026-01-10 20:24     ` Bernd Schubert
  2026-01-12 13:06     ` Jan Kara
  1 sibling, 0 replies; 8+ messages in thread
From: Bernd Schubert @ 2026-01-10 20:24 UTC (permalink / raw)
  To: Matthew Wilcox, Jan Kara
  Cc: Joanne Koong, Miklos Szeredi, Horst Birthelmer, linux-fsdevel,
	Andrew Morton, linux-mm, David Hildenbrand (Red Hat)



On 1/10/26 17:30, Matthew Wilcox wrote:
> On Sat, Jan 10, 2026 at 04:31:28PM +0100, Bernd Schubert wrote:
>> [  872.499480]  Possible interrupt unsafe locking scenario:
>> [  872.499480] 
>> [  872.500326]        CPU0                    CPU1
>> [  872.500906]        ----                    ----
>> [  872.501464]   lock(&p->sequence);
>> [  872.501923]                                local_irq_disable();
>> [  872.502615]                                lock(&xa->xa_lock#4);
>> [  872.503327]                                lock(&p->sequence);
>> [  872.504116]   <Interrupt>
>> [  872.504513]     lock(&xa->xa_lock#4);
>>
>>
>> Which is introduced by commit 2841808f35ee for all file systems. 
>> The should be rather generic - I shouldn't be the only one seeing
>> it?
> 
> Oh wow, 2841808f35ee has a very confusing commit message.  It implies
> that _no_ filesystem uses BDI_CAP_WRITEBACK_ACCT, but what it really
> means is that no filesystem now _clears_ BDI_CAP_WRITEBACK_ACCT, so
> all filesystems do use this code path and therefore the flag can be
> removed.  And that matches the code change.

Ah right, I had mixed it up, fuse was actually clearing
BDI_CAP_WRITEBACK_ACCT in the past.

> 
> So you should be able to reproduce this problem with commit 494d2f508883
> as well?

Yep, reproducible.

> 
> That tells me that this is something fuse-specific.  Other filesystems
> aren't seeing this.  Wonder why ...
> 
> __wb_writeout_add() or its predecessor __wb_writeout_inc() have been in
> that spot since 2015 or earlier.  
> 
> The sequence lock itself is taken inside fprop_new_period() called from
> writeout_period() which has been there since 2012, so that's not it.
> 
> Looking at fprop_new_period() is more interesting.  Commit a91befde3503
> removed an earlier call to local_irq_save().  It was then replaced with
> preempt_disable() in 9458e0a78c45 but maybe removing it was just
> erroneous?
> 
> Anyway, that was 2022, so it doesn't answer "why is this only showing up
> now and only for fuse?"  But maybe replacing the preempt-disable with
> irq-disable in fprop_new_period() is the right solution, regardless.

With fuse tmp pages mapping was NULL in past? I.e. I *guess* the trigger
is 0c58a97f919c ("fuse: remove tmp folio for writebacks and internal rb
tree"), although I'm confused why I didn't run into this earlier.


Bernd



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: __folio_end_writeback() lockdep issue
  2026-01-10 16:30   ` Matthew Wilcox
  2026-01-10 20:24     ` Bernd Schubert
@ 2026-01-12 13:06     ` Jan Kara
  2026-01-12 13:13       ` Bernd Schubert
                         ` (2 more replies)
  1 sibling, 3 replies; 8+ messages in thread
From: Jan Kara @ 2026-01-12 13:06 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Bernd Schubert, Jan Kara, Joanne Koong, Miklos Szeredi,
	Horst Birthelmer, linux-fsdevel, Andrew Morton, linux-mm,
	David Hildenbrand (Red Hat)

On Sat 10-01-26 16:30:28, Matthew Wilcox wrote:
> On Sat, Jan 10, 2026 at 04:31:28PM +0100, Bernd Schubert wrote:
> > [  872.499480]  Possible interrupt unsafe locking scenario:
> > [  872.499480] 
> > [  872.500326]        CPU0                    CPU1
> > [  872.500906]        ----                    ----
> > [  872.501464]   lock(&p->sequence);
> > [  872.501923]                                local_irq_disable();
> > [  872.502615]                                lock(&xa->xa_lock#4);
> > [  872.503327]                                lock(&p->sequence);
> > [  872.504116]   <Interrupt>
> > [  872.504513]     lock(&xa->xa_lock#4);
> > 
> > 
> > Which is introduced by commit 2841808f35ee for all file systems. 
> > The should be rather generic - I shouldn't be the only one seeing
> > it?
> 
> Oh wow, 2841808f35ee has a very confusing commit message.  It implies
> that _no_ filesystem uses BDI_CAP_WRITEBACK_ACCT, but what it really
> means is that no filesystem now _clears_ BDI_CAP_WRITEBACK_ACCT, so
> all filesystems do use this code path and therefore the flag can be
> removed.  And that matches the code change.
> 
> So you should be able to reproduce this problem with commit 494d2f508883
> as well?
> 
> That tells me that this is something fuse-specific.  Other filesystems
> aren't seeing this.  Wonder why ...
> 
> __wb_writeout_add() or its predecessor __wb_writeout_inc() have been in
> that spot since 2015 or earlier.  
> 
> The sequence lock itself is taken inside fprop_new_period() called from
> writeout_period() which has been there since 2012, so that's not it.
> 
> Looking at fprop_new_period() is more interesting.  Commit a91befde3503
> removed an earlier call to local_irq_save().  It was then replaced with
> preempt_disable() in 9458e0a78c45 but maybe removing it was just
> erroneous?
> 
> Anyway, that was 2022, so it doesn't answer "why is this only showing up
> now and only for fuse?"  But maybe replacing the preempt-disable with
> irq-disable in fprop_new_period() is the right solution, regardless.

So I don't have a great explanation why it is showing up only now and only
for FUSE. It seems the fprop code is unsafe wrt interrupts because
fprop_new_period() grabs

        write_seqcount_begin(&p->sequence);

and if IO completion interrupt on this CPU comes while p->sequence is odd,
the call to

	read_seqcount_begin(&p->sequence);

in __folio_end_writeback() -> __wb_writeout_add() -> wb_domain_writeout_add()
-> __fprop_add_percpu_max() -> fprop_fraction_percpu() will loop
indefinitely. *However* this isn't in fact possible because
fprop_new_period() is only called from a timer code and thus in softirq
context and thus IO completion softirq cannot really preempt it.

But for the same reason I don't think what lockdep complains about is
really possible because xa_lock gets only used from IO completion softirq as
well. Or can we really acquire it from some hard irq context? Based on
lockdep report at least lockdep things IO completion runs in hardirq
context but then I don't see why we're not seeing complaints like this all
the time and even deadlocks I've described above. I guess I'll have to do
some experimentation to refresh how these things behave these days...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: __folio_end_writeback() lockdep issue
  2026-01-12 13:06     ` Jan Kara
@ 2026-01-12 13:13       ` Bernd Schubert
  2026-01-12 17:20         ` Jan Kara
  2026-01-12 20:43       ` Joanne Koong
  2026-01-20 14:35       ` Jan Kara
  2 siblings, 1 reply; 8+ messages in thread
From: Bernd Schubert @ 2026-01-12 13:13 UTC (permalink / raw)
  To: Jan Kara, Matthew Wilcox
  Cc: Joanne Koong, Miklos Szeredi, Horst Birthelmer, linux-fsdevel,
	Andrew Morton, linux-mm, David Hildenbrand (Red Hat)



On 1/12/26 14:06, Jan Kara wrote:
> On Sat 10-01-26 16:30:28, Matthew Wilcox wrote:
>> On Sat, Jan 10, 2026 at 04:31:28PM +0100, Bernd Schubert wrote:
>>> [  872.499480]  Possible interrupt unsafe locking scenario:
>>> [  872.499480] 
>>> [  872.500326]        CPU0                    CPU1
>>> [  872.500906]        ----                    ----
>>> [  872.501464]   lock(&p->sequence);
>>> [  872.501923]                                local_irq_disable();
>>> [  872.502615]                                lock(&xa->xa_lock#4);
>>> [  872.503327]                                lock(&p->sequence);
>>> [  872.504116]   <Interrupt>
>>> [  872.504513]     lock(&xa->xa_lock#4);
>>>
>>>
>>> Which is introduced by commit 2841808f35ee for all file systems. 
>>> The should be rather generic - I shouldn't be the only one seeing
>>> it?
>>
>> Oh wow, 2841808f35ee has a very confusing commit message.  It implies
>> that _no_ filesystem uses BDI_CAP_WRITEBACK_ACCT, but what it really
>> means is that no filesystem now _clears_ BDI_CAP_WRITEBACK_ACCT, so
>> all filesystems do use this code path and therefore the flag can be
>> removed.  And that matches the code change.
>>
>> So you should be able to reproduce this problem with commit 494d2f508883
>> as well?
>>
>> That tells me that this is something fuse-specific.  Other filesystems
>> aren't seeing this.  Wonder why ...
>>
>> __wb_writeout_add() or its predecessor __wb_writeout_inc() have been in
>> that spot since 2015 or earlier.  
>>
>> The sequence lock itself is taken inside fprop_new_period() called from
>> writeout_period() which has been there since 2012, so that's not it.
>>
>> Looking at fprop_new_period() is more interesting.  Commit a91befde3503
>> removed an earlier call to local_irq_save().  It was then replaced with
>> preempt_disable() in 9458e0a78c45 but maybe removing it was just
>> erroneous?
>>
>> Anyway, that was 2022, so it doesn't answer "why is this only showing up
>> now and only for fuse?"  But maybe replacing the preempt-disable with
>> irq-disable in fprop_new_period() is the right solution, regardless.
> 
> So I don't have a great explanation why it is showing up only now and only
> for FUSE. It seems the fprop code is unsafe wrt interrupts because
> fprop_new_period() grabs
> 
>         write_seqcount_begin(&p->sequence);
> 
> and if IO completion interrupt on this CPU comes while p->sequence is odd,
> the call to
> 
> 	read_seqcount_begin(&p->sequence);
> 
> in __folio_end_writeback() -> __wb_writeout_add() -> wb_domain_writeout_add()
> -> __fprop_add_percpu_max() -> fprop_fraction_percpu() will loop
> indefinitely. *However* this isn't in fact possible because
> fprop_new_period() is only called from a timer code and thus in softirq
> context and thus IO completion softirq cannot really preempt it.
> 
> But for the same reason I don't think what lockdep complains about is
> really possible because xa_lock gets only used from IO completion softirq as
> well. Or can we really acquire it from some hard irq context? Based on
> lockdep report at least lockdep things IO completion runs in hardirq
> context but then I don't see why we're not seeing complaints like this all
> the time and even deadlocks I've described above. I guess I'll have to do
> some experimentation to refresh how these things behave these days...
> 
> 								Honza

Is there anything that speaks about the patch I had posted?
__wb_writeout_add() doesn't need the xa lock?


Thanks,
Bernd


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: __folio_end_writeback() lockdep issue
  2026-01-12 13:13       ` Bernd Schubert
@ 2026-01-12 17:20         ` Jan Kara
  0 siblings, 0 replies; 8+ messages in thread
From: Jan Kara @ 2026-01-12 17:20 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: Jan Kara, Matthew Wilcox, Joanne Koong, Miklos Szeredi,
	Horst Birthelmer, linux-fsdevel, Andrew Morton, linux-mm,
	David Hildenbrand (Red Hat)

On Mon 12-01-26 14:13:26, Bernd Schubert wrote:
> 
> 
> On 1/12/26 14:06, Jan Kara wrote:
> > On Sat 10-01-26 16:30:28, Matthew Wilcox wrote:
> >> On Sat, Jan 10, 2026 at 04:31:28PM +0100, Bernd Schubert wrote:
> >>> [  872.499480]  Possible interrupt unsafe locking scenario:
> >>> [  872.499480] 
> >>> [  872.500326]        CPU0                    CPU1
> >>> [  872.500906]        ----                    ----
> >>> [  872.501464]   lock(&p->sequence);
> >>> [  872.501923]                                local_irq_disable();
> >>> [  872.502615]                                lock(&xa->xa_lock#4);
> >>> [  872.503327]                                lock(&p->sequence);
> >>> [  872.504116]   <Interrupt>
> >>> [  872.504513]     lock(&xa->xa_lock#4);
> >>>
> >>>
> >>> Which is introduced by commit 2841808f35ee for all file systems. 
> >>> The should be rather generic - I shouldn't be the only one seeing
> >>> it?
> >>
> >> Oh wow, 2841808f35ee has a very confusing commit message.  It implies
> >> that _no_ filesystem uses BDI_CAP_WRITEBACK_ACCT, but what it really
> >> means is that no filesystem now _clears_ BDI_CAP_WRITEBACK_ACCT, so
> >> all filesystems do use this code path and therefore the flag can be
> >> removed.  And that matches the code change.
> >>
> >> So you should be able to reproduce this problem with commit 494d2f508883
> >> as well?
> >>
> >> That tells me that this is something fuse-specific.  Other filesystems
> >> aren't seeing this.  Wonder why ...
> >>
> >> __wb_writeout_add() or its predecessor __wb_writeout_inc() have been in
> >> that spot since 2015 or earlier.  
> >>
> >> The sequence lock itself is taken inside fprop_new_period() called from
> >> writeout_period() which has been there since 2012, so that's not it.
> >>
> >> Looking at fprop_new_period() is more interesting.  Commit a91befde3503
> >> removed an earlier call to local_irq_save().  It was then replaced with
> >> preempt_disable() in 9458e0a78c45 but maybe removing it was just
> >> erroneous?
> >>
> >> Anyway, that was 2022, so it doesn't answer "why is this only showing up
> >> now and only for fuse?"  But maybe replacing the preempt-disable with
> >> irq-disable in fprop_new_period() is the right solution, regardless.
> > 
> > So I don't have a great explanation why it is showing up only now and only
> > for FUSE. It seems the fprop code is unsafe wrt interrupts because
> > fprop_new_period() grabs
> > 
> >         write_seqcount_begin(&p->sequence);
> > 
> > and if IO completion interrupt on this CPU comes while p->sequence is odd,
> > the call to
> > 
> > 	read_seqcount_begin(&p->sequence);
> > 
> > in __folio_end_writeback() -> __wb_writeout_add() -> wb_domain_writeout_add()
> > -> __fprop_add_percpu_max() -> fprop_fraction_percpu() will loop
> > indefinitely. *However* this isn't in fact possible because
> > fprop_new_period() is only called from a timer code and thus in softirq
> > context and thus IO completion softirq cannot really preempt it.
> > 
> > But for the same reason I don't think what lockdep complains about is
> > really possible because xa_lock gets only used from IO completion softirq as
> > well. Or can we really acquire it from some hard irq context? Based on
> > lockdep report at least lockdep things IO completion runs in hardirq
> > context but then I don't see why we're not seeing complaints like this all
> > the time and even deadlocks I've described above. I guess I'll have to do
> > some experimentation to refresh how these things behave these days...
> 
> Is there anything that speaks about the patch I had posted?
> __wb_writeout_add() doesn't need the xa lock?

That's a bit hairy question :) because currently xa lock is what makes sure
the wb we've got from the inode is still valid. Now that you've forced me
to think about it: If we move __wb_writeout_add() from under xa lock, inode
could be switched to different wb after we release xa lock and before we
call __wb_writeout_add(). I don't think there is anything that would
protect the wb from being freed in that case before __wb_writeout_add() and
thus we could create UAF issue if I'm not mistaken.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: __folio_end_writeback() lockdep issue
  2026-01-12 13:06     ` Jan Kara
  2026-01-12 13:13       ` Bernd Schubert
@ 2026-01-12 20:43       ` Joanne Koong
  2026-01-20 14:35       ` Jan Kara
  2 siblings, 0 replies; 8+ messages in thread
From: Joanne Koong @ 2026-01-12 20:43 UTC (permalink / raw)
  To: Jan Kara
  Cc: Matthew Wilcox, Bernd Schubert, Miklos Szeredi, Horst Birthelmer,
	linux-fsdevel, Andrew Morton, linux-mm,
	David Hildenbrand (Red Hat),
	hdanton

On Mon, Jan 12, 2026 at 5:06 AM Jan Kara <jack@suse.cz> wrote:
>
> On Sat 10-01-26 16:30:28, Matthew Wilcox wrote:
> > On Sat, Jan 10, 2026 at 04:31:28PM +0100, Bernd Schubert wrote:
> > > [  872.499480]  Possible interrupt unsafe locking scenario:
> > > [  872.499480]
> > > [  872.500326]        CPU0                    CPU1
> > > [  872.500906]        ----                    ----
> > > [  872.501464]   lock(&p->sequence);
> > > [  872.501923]                                local_irq_disable();
> > > [  872.502615]                                lock(&xa->xa_lock#4);
> > > [  872.503327]                                lock(&p->sequence);
> > > [  872.504116]   <Interrupt>
> > > [  872.504513]     lock(&xa->xa_lock#4);
> > >
> > >
> > > Which is introduced by commit 2841808f35ee for all file systems.
> > > The should be rather generic - I shouldn't be the only one seeing
> > > it?
> >
> > Oh wow, 2841808f35ee has a very confusing commit message.  It implies
> > that _no_ filesystem uses BDI_CAP_WRITEBACK_ACCT, but what it really
> > means is that no filesystem now _clears_ BDI_CAP_WRITEBACK_ACCT, so
> > all filesystems do use this code path and therefore the flag can be
> > removed.  And that matches the code change.
> >
> > So you should be able to reproduce this problem with commit 494d2f508883
> > as well?
> >
> > That tells me that this is something fuse-specific.  Other filesystems
> > aren't seeing this.  Wonder why ...
> >
> > __wb_writeout_add() or its predecessor __wb_writeout_inc() have been in
> > that spot since 2015 or earlier.
> >
> > The sequence lock itself is taken inside fprop_new_period() called from
> > writeout_period() which has been there since 2012, so that's not it.
> >
> > Looking at fprop_new_period() is more interesting.  Commit a91befde3503
> > removed an earlier call to local_irq_save().  It was then replaced with
> > preempt_disable() in 9458e0a78c45 but maybe removing it was just
> > erroneous?
> >
> > Anyway, that was 2022, so it doesn't answer "why is this only showing up
> > now and only for fuse?"  But maybe replacing the preempt-disable with
> > irq-disable in fprop_new_period() is the right solution, regardless.
>
> So I don't have a great explanation why it is showing up only now and only
> for FUSE. It seems the fprop code is unsafe wrt interrupts because
> fprop_new_period() grabs
>
>         write_seqcount_begin(&p->sequence);
>
> and if IO completion interrupt on this CPU comes while p->sequence is odd,
> the call to
>
>         read_seqcount_begin(&p->sequence);
>
> in __folio_end_writeback() -> __wb_writeout_add() -> wb_domain_writeout_add()
> -> __fprop_add_percpu_max() -> fprop_fraction_percpu() will loop
> indefinitely. *However* this isn't in fact possible because
> fprop_new_period() is only called from a timer code and thus in softirq
> context and thus IO completion softirq cannot really preempt it.
>
> But for the same reason I don't think what lockdep complains about is
> really possible because xa_lock gets only used from IO completion softirq as
> well. Or can we really acquire it from some hard irq context? Based on
> lockdep report at least lockdep things IO completion runs in hardirq
> context but then I don't see why we're not seeing complaints like this all
> the time and even deadlocks I've described above. I guess I'll have to do
> some experimentation to refresh how these things behave these days...

I looked into this briefly when there was a syzbot report about this
in october [1]. The stack trace is a little different but I think it's
pretty much the exact same scenario.

I came to the same conclusion Jan came to. Copying over my notes from
October: "I don't think this is a real deadlock. My understanding of
it is that the other thread where the timer is running that is calling
fprop_new_period() could only try grabbing the xa_array lock in a
hardware interrupt handler since it disables preemption and is already
running in softirq context which means it can't get interrupted by
another softirq or scheduled out and the hardware interrupt handling
code in the drivers don't access the page cache directly."

It's unclear to me why it's only complaining about this for FUSE.

This seemed like a false positive scenario to me, but I also think the
patches Hilf tried out in [1] seemed reasonable.

Thanks,
Joanne

[1] https://lore.kernel.org/all/68e583e1.a00a0220.298cc0.0485.GAE@google.com/

>
>                                                                 Honza
> --
> Jan Kara <jack@suse.com>
> SUSE Labs, CR


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: __folio_end_writeback() lockdep issue
  2026-01-12 13:06     ` Jan Kara
  2026-01-12 13:13       ` Bernd Schubert
  2026-01-12 20:43       ` Joanne Koong
@ 2026-01-20 14:35       ` Jan Kara
  2 siblings, 0 replies; 8+ messages in thread
From: Jan Kara @ 2026-01-20 14:35 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Bernd Schubert, Jan Kara, Joanne Koong, Miklos Szeredi,
	Horst Birthelmer, linux-fsdevel, Andrew Morton, linux-mm,
	David Hildenbrand (Red Hat)

On Mon 12-01-26 14:06:26, Jan Kara wrote:
> On Sat 10-01-26 16:30:28, Matthew Wilcox wrote:
> > On Sat, Jan 10, 2026 at 04:31:28PM +0100, Bernd Schubert wrote:
> > > [  872.499480]  Possible interrupt unsafe locking scenario:
> > > [  872.499480] 
> > > [  872.500326]        CPU0                    CPU1
> > > [  872.500906]        ----                    ----
> > > [  872.501464]   lock(&p->sequence);
> > > [  872.501923]                                local_irq_disable();
> > > [  872.502615]                                lock(&xa->xa_lock#4);
> > > [  872.503327]                                lock(&p->sequence);
> > > [  872.504116]   <Interrupt>
> > > [  872.504513]     lock(&xa->xa_lock#4);
> > > 
> > > 
> > > Which is introduced by commit 2841808f35ee for all file systems. 
> > > The should be rather generic - I shouldn't be the only one seeing
> > > it?
> > 
> > Oh wow, 2841808f35ee has a very confusing commit message.  It implies
> > that _no_ filesystem uses BDI_CAP_WRITEBACK_ACCT, but what it really
> > means is that no filesystem now _clears_ BDI_CAP_WRITEBACK_ACCT, so
> > all filesystems do use this code path and therefore the flag can be
> > removed.  And that matches the code change.
> > 
> > So you should be able to reproduce this problem with commit 494d2f508883
> > as well?
> > 
> > That tells me that this is something fuse-specific.  Other filesystems
> > aren't seeing this.  Wonder why ...
> > 
> > __wb_writeout_add() or its predecessor __wb_writeout_inc() have been in
> > that spot since 2015 or earlier.  
> > 
> > The sequence lock itself is taken inside fprop_new_period() called from
> > writeout_period() which has been there since 2012, so that's not it.
> > 
> > Looking at fprop_new_period() is more interesting.  Commit a91befde3503
> > removed an earlier call to local_irq_save().  It was then replaced with
> > preempt_disable() in 9458e0a78c45 but maybe removing it was just
> > erroneous?
> > 
> > Anyway, that was 2022, so it doesn't answer "why is this only showing up
> > now and only for fuse?"  But maybe replacing the preempt-disable with
> > irq-disable in fprop_new_period() is the right solution, regardless.
> 
> So I don't have a great explanation why it is showing up only now and only
> for FUSE. It seems the fprop code is unsafe wrt interrupts because
> fprop_new_period() grabs
> 
>         write_seqcount_begin(&p->sequence);
> 
> and if IO completion interrupt on this CPU comes while p->sequence is odd,
> the call to
> 
> 	read_seqcount_begin(&p->sequence);
> 
> in __folio_end_writeback() -> __wb_writeout_add() -> wb_domain_writeout_add()
> -> __fprop_add_percpu_max() -> fprop_fraction_percpu() will loop
> indefinitely. *However* this isn't in fact possible because
> fprop_new_period() is only called from a timer code and thus in softirq
> context and thus IO completion softirq cannot really preempt it.
> 
> But for the same reason I don't think what lockdep complains about is
> really possible because xa_lock gets only used from IO completion softirq as
> well. Or can we really acquire it from some hard irq context? Based on
> lockdep report at least lockdep things IO completion runs in hardirq
> context but then I don't see why we're not seeing complaints like this all
> the time and even deadlocks I've described above. I guess I'll have to do
> some experimentation to refresh how these things behave these days...

So I've got to experiment with this some more. It appears that my
recollection of block layer IO completion code is about decade old and
these days bio completion routines (and thus __folio_end_writeback()) are
getting called in hardirq context in most cases (verified this with ftrace).
Hence the flexible proportions code is indeed prone to deadlocks with
hardirqs. I'm just wondering why aren't we seeing much more lockdep reports
about this and possibly also real deadlocks... Anyway, I'll send a patch to
make fprop_new_period() irq safe.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2026-01-20 14:35 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <9b845a47-9aee-43dd-99bc-1a82bea00442@bsbernd.com>
2026-01-10 15:31 ` __folio_end_writeback() lockdep issue Bernd Schubert
2026-01-10 16:30   ` Matthew Wilcox
2026-01-10 20:24     ` Bernd Schubert
2026-01-12 13:06     ` Jan Kara
2026-01-12 13:13       ` Bernd Schubert
2026-01-12 17:20         ` Jan Kara
2026-01-12 20:43       ` Joanne Koong
2026-01-20 14:35       ` Jan Kara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox