* Re: __folio_end_writeback() lockdep issue
2026-01-12 13:06 ` Jan Kara
@ 2026-01-12 13:13 ` Bernd Schubert
2026-01-12 17:20 ` Jan Kara
2026-01-12 20:43 ` Joanne Koong
2026-01-20 14:35 ` Jan Kara
2 siblings, 1 reply; 8+ messages in thread
From: Bernd Schubert @ 2026-01-12 13:13 UTC (permalink / raw)
To: Jan Kara, Matthew Wilcox
Cc: Joanne Koong, Miklos Szeredi, Horst Birthelmer, linux-fsdevel,
Andrew Morton, linux-mm, David Hildenbrand (Red Hat)
On 1/12/26 14:06, Jan Kara wrote:
> On Sat 10-01-26 16:30:28, Matthew Wilcox wrote:
>> On Sat, Jan 10, 2026 at 04:31:28PM +0100, Bernd Schubert wrote:
>>> [ 872.499480] Possible interrupt unsafe locking scenario:
>>> [ 872.499480]
>>> [ 872.500326] CPU0 CPU1
>>> [ 872.500906] ---- ----
>>> [ 872.501464] lock(&p->sequence);
>>> [ 872.501923] local_irq_disable();
>>> [ 872.502615] lock(&xa->xa_lock#4);
>>> [ 872.503327] lock(&p->sequence);
>>> [ 872.504116] <Interrupt>
>>> [ 872.504513] lock(&xa->xa_lock#4);
>>>
>>>
>>> Which is introduced by commit 2841808f35ee for all file systems.
>>> The should be rather generic - I shouldn't be the only one seeing
>>> it?
>>
>> Oh wow, 2841808f35ee has a very confusing commit message. It implies
>> that _no_ filesystem uses BDI_CAP_WRITEBACK_ACCT, but what it really
>> means is that no filesystem now _clears_ BDI_CAP_WRITEBACK_ACCT, so
>> all filesystems do use this code path and therefore the flag can be
>> removed. And that matches the code change.
>>
>> So you should be able to reproduce this problem with commit 494d2f508883
>> as well?
>>
>> That tells me that this is something fuse-specific. Other filesystems
>> aren't seeing this. Wonder why ...
>>
>> __wb_writeout_add() or its predecessor __wb_writeout_inc() have been in
>> that spot since 2015 or earlier.
>>
>> The sequence lock itself is taken inside fprop_new_period() called from
>> writeout_period() which has been there since 2012, so that's not it.
>>
>> Looking at fprop_new_period() is more interesting. Commit a91befde3503
>> removed an earlier call to local_irq_save(). It was then replaced with
>> preempt_disable() in 9458e0a78c45 but maybe removing it was just
>> erroneous?
>>
>> Anyway, that was 2022, so it doesn't answer "why is this only showing up
>> now and only for fuse?" But maybe replacing the preempt-disable with
>> irq-disable in fprop_new_period() is the right solution, regardless.
>
> So I don't have a great explanation why it is showing up only now and only
> for FUSE. It seems the fprop code is unsafe wrt interrupts because
> fprop_new_period() grabs
>
> write_seqcount_begin(&p->sequence);
>
> and if IO completion interrupt on this CPU comes while p->sequence is odd,
> the call to
>
> read_seqcount_begin(&p->sequence);
>
> in __folio_end_writeback() -> __wb_writeout_add() -> wb_domain_writeout_add()
> -> __fprop_add_percpu_max() -> fprop_fraction_percpu() will loop
> indefinitely. *However* this isn't in fact possible because
> fprop_new_period() is only called from a timer code and thus in softirq
> context and thus IO completion softirq cannot really preempt it.
>
> But for the same reason I don't think what lockdep complains about is
> really possible because xa_lock gets only used from IO completion softirq as
> well. Or can we really acquire it from some hard irq context? Based on
> lockdep report at least lockdep things IO completion runs in hardirq
> context but then I don't see why we're not seeing complaints like this all
> the time and even deadlocks I've described above. I guess I'll have to do
> some experimentation to refresh how these things behave these days...
>
> Honza
Is there anything that speaks about the patch I had posted?
__wb_writeout_add() doesn't need the xa lock?
Thanks,
Bernd
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: __folio_end_writeback() lockdep issue
2026-01-12 13:13 ` Bernd Schubert
@ 2026-01-12 17:20 ` Jan Kara
0 siblings, 0 replies; 8+ messages in thread
From: Jan Kara @ 2026-01-12 17:20 UTC (permalink / raw)
To: Bernd Schubert
Cc: Jan Kara, Matthew Wilcox, Joanne Koong, Miklos Szeredi,
Horst Birthelmer, linux-fsdevel, Andrew Morton, linux-mm,
David Hildenbrand (Red Hat)
On Mon 12-01-26 14:13:26, Bernd Schubert wrote:
>
>
> On 1/12/26 14:06, Jan Kara wrote:
> > On Sat 10-01-26 16:30:28, Matthew Wilcox wrote:
> >> On Sat, Jan 10, 2026 at 04:31:28PM +0100, Bernd Schubert wrote:
> >>> [ 872.499480] Possible interrupt unsafe locking scenario:
> >>> [ 872.499480]
> >>> [ 872.500326] CPU0 CPU1
> >>> [ 872.500906] ---- ----
> >>> [ 872.501464] lock(&p->sequence);
> >>> [ 872.501923] local_irq_disable();
> >>> [ 872.502615] lock(&xa->xa_lock#4);
> >>> [ 872.503327] lock(&p->sequence);
> >>> [ 872.504116] <Interrupt>
> >>> [ 872.504513] lock(&xa->xa_lock#4);
> >>>
> >>>
> >>> Which is introduced by commit 2841808f35ee for all file systems.
> >>> The should be rather generic - I shouldn't be the only one seeing
> >>> it?
> >>
> >> Oh wow, 2841808f35ee has a very confusing commit message. It implies
> >> that _no_ filesystem uses BDI_CAP_WRITEBACK_ACCT, but what it really
> >> means is that no filesystem now _clears_ BDI_CAP_WRITEBACK_ACCT, so
> >> all filesystems do use this code path and therefore the flag can be
> >> removed. And that matches the code change.
> >>
> >> So you should be able to reproduce this problem with commit 494d2f508883
> >> as well?
> >>
> >> That tells me that this is something fuse-specific. Other filesystems
> >> aren't seeing this. Wonder why ...
> >>
> >> __wb_writeout_add() or its predecessor __wb_writeout_inc() have been in
> >> that spot since 2015 or earlier.
> >>
> >> The sequence lock itself is taken inside fprop_new_period() called from
> >> writeout_period() which has been there since 2012, so that's not it.
> >>
> >> Looking at fprop_new_period() is more interesting. Commit a91befde3503
> >> removed an earlier call to local_irq_save(). It was then replaced with
> >> preempt_disable() in 9458e0a78c45 but maybe removing it was just
> >> erroneous?
> >>
> >> Anyway, that was 2022, so it doesn't answer "why is this only showing up
> >> now and only for fuse?" But maybe replacing the preempt-disable with
> >> irq-disable in fprop_new_period() is the right solution, regardless.
> >
> > So I don't have a great explanation why it is showing up only now and only
> > for FUSE. It seems the fprop code is unsafe wrt interrupts because
> > fprop_new_period() grabs
> >
> > write_seqcount_begin(&p->sequence);
> >
> > and if IO completion interrupt on this CPU comes while p->sequence is odd,
> > the call to
> >
> > read_seqcount_begin(&p->sequence);
> >
> > in __folio_end_writeback() -> __wb_writeout_add() -> wb_domain_writeout_add()
> > -> __fprop_add_percpu_max() -> fprop_fraction_percpu() will loop
> > indefinitely. *However* this isn't in fact possible because
> > fprop_new_period() is only called from a timer code and thus in softirq
> > context and thus IO completion softirq cannot really preempt it.
> >
> > But for the same reason I don't think what lockdep complains about is
> > really possible because xa_lock gets only used from IO completion softirq as
> > well. Or can we really acquire it from some hard irq context? Based on
> > lockdep report at least lockdep things IO completion runs in hardirq
> > context but then I don't see why we're not seeing complaints like this all
> > the time and even deadlocks I've described above. I guess I'll have to do
> > some experimentation to refresh how these things behave these days...
>
> Is there anything that speaks about the patch I had posted?
> __wb_writeout_add() doesn't need the xa lock?
That's a bit hairy question :) because currently xa lock is what makes sure
the wb we've got from the inode is still valid. Now that you've forced me
to think about it: If we move __wb_writeout_add() from under xa lock, inode
could be switched to different wb after we release xa lock and before we
call __wb_writeout_add(). I don't think there is anything that would
protect the wb from being freed in that case before __wb_writeout_add() and
thus we could create UAF issue if I'm not mistaken.
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: __folio_end_writeback() lockdep issue
2026-01-12 13:06 ` Jan Kara
2026-01-12 13:13 ` Bernd Schubert
@ 2026-01-12 20:43 ` Joanne Koong
2026-01-20 14:35 ` Jan Kara
2 siblings, 0 replies; 8+ messages in thread
From: Joanne Koong @ 2026-01-12 20:43 UTC (permalink / raw)
To: Jan Kara
Cc: Matthew Wilcox, Bernd Schubert, Miklos Szeredi, Horst Birthelmer,
linux-fsdevel, Andrew Morton, linux-mm,
David Hildenbrand (Red Hat),
hdanton
On Mon, Jan 12, 2026 at 5:06 AM Jan Kara <jack@suse.cz> wrote:
>
> On Sat 10-01-26 16:30:28, Matthew Wilcox wrote:
> > On Sat, Jan 10, 2026 at 04:31:28PM +0100, Bernd Schubert wrote:
> > > [ 872.499480] Possible interrupt unsafe locking scenario:
> > > [ 872.499480]
> > > [ 872.500326] CPU0 CPU1
> > > [ 872.500906] ---- ----
> > > [ 872.501464] lock(&p->sequence);
> > > [ 872.501923] local_irq_disable();
> > > [ 872.502615] lock(&xa->xa_lock#4);
> > > [ 872.503327] lock(&p->sequence);
> > > [ 872.504116] <Interrupt>
> > > [ 872.504513] lock(&xa->xa_lock#4);
> > >
> > >
> > > Which is introduced by commit 2841808f35ee for all file systems.
> > > The should be rather generic - I shouldn't be the only one seeing
> > > it?
> >
> > Oh wow, 2841808f35ee has a very confusing commit message. It implies
> > that _no_ filesystem uses BDI_CAP_WRITEBACK_ACCT, but what it really
> > means is that no filesystem now _clears_ BDI_CAP_WRITEBACK_ACCT, so
> > all filesystems do use this code path and therefore the flag can be
> > removed. And that matches the code change.
> >
> > So you should be able to reproduce this problem with commit 494d2f508883
> > as well?
> >
> > That tells me that this is something fuse-specific. Other filesystems
> > aren't seeing this. Wonder why ...
> >
> > __wb_writeout_add() or its predecessor __wb_writeout_inc() have been in
> > that spot since 2015 or earlier.
> >
> > The sequence lock itself is taken inside fprop_new_period() called from
> > writeout_period() which has been there since 2012, so that's not it.
> >
> > Looking at fprop_new_period() is more interesting. Commit a91befde3503
> > removed an earlier call to local_irq_save(). It was then replaced with
> > preempt_disable() in 9458e0a78c45 but maybe removing it was just
> > erroneous?
> >
> > Anyway, that was 2022, so it doesn't answer "why is this only showing up
> > now and only for fuse?" But maybe replacing the preempt-disable with
> > irq-disable in fprop_new_period() is the right solution, regardless.
>
> So I don't have a great explanation why it is showing up only now and only
> for FUSE. It seems the fprop code is unsafe wrt interrupts because
> fprop_new_period() grabs
>
> write_seqcount_begin(&p->sequence);
>
> and if IO completion interrupt on this CPU comes while p->sequence is odd,
> the call to
>
> read_seqcount_begin(&p->sequence);
>
> in __folio_end_writeback() -> __wb_writeout_add() -> wb_domain_writeout_add()
> -> __fprop_add_percpu_max() -> fprop_fraction_percpu() will loop
> indefinitely. *However* this isn't in fact possible because
> fprop_new_period() is only called from a timer code and thus in softirq
> context and thus IO completion softirq cannot really preempt it.
>
> But for the same reason I don't think what lockdep complains about is
> really possible because xa_lock gets only used from IO completion softirq as
> well. Or can we really acquire it from some hard irq context? Based on
> lockdep report at least lockdep things IO completion runs in hardirq
> context but then I don't see why we're not seeing complaints like this all
> the time and even deadlocks I've described above. I guess I'll have to do
> some experimentation to refresh how these things behave these days...
I looked into this briefly when there was a syzbot report about this
in october [1]. The stack trace is a little different but I think it's
pretty much the exact same scenario.
I came to the same conclusion Jan came to. Copying over my notes from
October: "I don't think this is a real deadlock. My understanding of
it is that the other thread where the timer is running that is calling
fprop_new_period() could only try grabbing the xa_array lock in a
hardware interrupt handler since it disables preemption and is already
running in softirq context which means it can't get interrupted by
another softirq or scheduled out and the hardware interrupt handling
code in the drivers don't access the page cache directly."
It's unclear to me why it's only complaining about this for FUSE.
This seemed like a false positive scenario to me, but I also think the
patches Hilf tried out in [1] seemed reasonable.
Thanks,
Joanne
[1] https://lore.kernel.org/all/68e583e1.a00a0220.298cc0.0485.GAE@google.com/
>
> Honza
> --
> Jan Kara <jack@suse.com>
> SUSE Labs, CR
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: __folio_end_writeback() lockdep issue
2026-01-12 13:06 ` Jan Kara
2026-01-12 13:13 ` Bernd Schubert
2026-01-12 20:43 ` Joanne Koong
@ 2026-01-20 14:35 ` Jan Kara
2 siblings, 0 replies; 8+ messages in thread
From: Jan Kara @ 2026-01-20 14:35 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Bernd Schubert, Jan Kara, Joanne Koong, Miklos Szeredi,
Horst Birthelmer, linux-fsdevel, Andrew Morton, linux-mm,
David Hildenbrand (Red Hat)
On Mon 12-01-26 14:06:26, Jan Kara wrote:
> On Sat 10-01-26 16:30:28, Matthew Wilcox wrote:
> > On Sat, Jan 10, 2026 at 04:31:28PM +0100, Bernd Schubert wrote:
> > > [ 872.499480] Possible interrupt unsafe locking scenario:
> > > [ 872.499480]
> > > [ 872.500326] CPU0 CPU1
> > > [ 872.500906] ---- ----
> > > [ 872.501464] lock(&p->sequence);
> > > [ 872.501923] local_irq_disable();
> > > [ 872.502615] lock(&xa->xa_lock#4);
> > > [ 872.503327] lock(&p->sequence);
> > > [ 872.504116] <Interrupt>
> > > [ 872.504513] lock(&xa->xa_lock#4);
> > >
> > >
> > > Which is introduced by commit 2841808f35ee for all file systems.
> > > The should be rather generic - I shouldn't be the only one seeing
> > > it?
> >
> > Oh wow, 2841808f35ee has a very confusing commit message. It implies
> > that _no_ filesystem uses BDI_CAP_WRITEBACK_ACCT, but what it really
> > means is that no filesystem now _clears_ BDI_CAP_WRITEBACK_ACCT, so
> > all filesystems do use this code path and therefore the flag can be
> > removed. And that matches the code change.
> >
> > So you should be able to reproduce this problem with commit 494d2f508883
> > as well?
> >
> > That tells me that this is something fuse-specific. Other filesystems
> > aren't seeing this. Wonder why ...
> >
> > __wb_writeout_add() or its predecessor __wb_writeout_inc() have been in
> > that spot since 2015 or earlier.
> >
> > The sequence lock itself is taken inside fprop_new_period() called from
> > writeout_period() which has been there since 2012, so that's not it.
> >
> > Looking at fprop_new_period() is more interesting. Commit a91befde3503
> > removed an earlier call to local_irq_save(). It was then replaced with
> > preempt_disable() in 9458e0a78c45 but maybe removing it was just
> > erroneous?
> >
> > Anyway, that was 2022, so it doesn't answer "why is this only showing up
> > now and only for fuse?" But maybe replacing the preempt-disable with
> > irq-disable in fprop_new_period() is the right solution, regardless.
>
> So I don't have a great explanation why it is showing up only now and only
> for FUSE. It seems the fprop code is unsafe wrt interrupts because
> fprop_new_period() grabs
>
> write_seqcount_begin(&p->sequence);
>
> and if IO completion interrupt on this CPU comes while p->sequence is odd,
> the call to
>
> read_seqcount_begin(&p->sequence);
>
> in __folio_end_writeback() -> __wb_writeout_add() -> wb_domain_writeout_add()
> -> __fprop_add_percpu_max() -> fprop_fraction_percpu() will loop
> indefinitely. *However* this isn't in fact possible because
> fprop_new_period() is only called from a timer code and thus in softirq
> context and thus IO completion softirq cannot really preempt it.
>
> But for the same reason I don't think what lockdep complains about is
> really possible because xa_lock gets only used from IO completion softirq as
> well. Or can we really acquire it from some hard irq context? Based on
> lockdep report at least lockdep things IO completion runs in hardirq
> context but then I don't see why we're not seeing complaints like this all
> the time and even deadlocks I've described above. I guess I'll have to do
> some experimentation to refresh how these things behave these days...
So I've got to experiment with this some more. It appears that my
recollection of block layer IO completion code is about decade old and
these days bio completion routines (and thus __folio_end_writeback()) are
getting called in hardirq context in most cases (verified this with ftrace).
Hence the flexible proportions code is indeed prone to deadlocks with
hardirqs. I'm just wondering why aren't we seeing much more lockdep reports
about this and possibly also real deadlocks... Anyway, I'll send a patch to
make fprop_new_period() irq safe.
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 8+ messages in thread