Re: __folio_end_writeback() lockdep issue

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Bernd Schubert <bernd@bsbernd.com>
To: Joanne Koong <joannelkoong@gmail.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>,
	Horst Birthelmer <hbirthelmer@ddn.com>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"Matthew Wilcox (Oracle)" <willy@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"David Hildenbrand (Red Hat)" <david@kernel.org>
Subject: Re: __folio_end_writeback() lockdep issue
Date: Sat, 10 Jan 2026 16:31:28 +0100	[thread overview]
Message-ID: <b7b72183-f9e1-4e58-b40f-45a267cc6831@bsbernd.com> (raw)
In-Reply-To: <9b845a47-9aee-43dd-99bc-1a82bea00442@bsbernd.com>



On 1/10/26 15:05, Bernd Schubert wrote:
> Hi Joanne,
> 
> I run in lockdep issues on testing 6.19. And I think it is due to
> holding fi->lock in fuse_writepage_end() until fuse_writepage_finish() is
> complete
> 
> Proposed patch is
> 
> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> index 01bc894e9c2b..b2cd270c75d8 100644
> --- a/fs/fuse/file.c
> +++ b/fs/fuse/file.c
> @@ -2000,8 +2000,8 @@ static void fuse_writepage_end(struct fuse_mount *fm, struct fuse_args *args,
>                 fuse_invalidate_attr_mask(inode, FUSE_STATX_MODIFY);
>         spin_lock(&fi->lock);
>         fi->writectr--;
> -       fuse_writepage_finish(wpa);
>         spin_unlock(&fi->lock);
> +       fuse_writepage_finish(wpa);
>         fuse_writepage_free(wpa);
>  }
>  
> 
> But then there is this comment in fuse_writepage_finish
> 
> 		/*
> 		 * Benchmarks showed that ending writeback within the
> 		 * scope of the fi->lock alleviates xarray lock
> 		 * contention and noticeably improves performance.
> 		 */
> 
> 


Hmm, actually the critical part is

[  872.499480]  Possible interrupt unsafe locking scenario:
[  872.499480] 
[  872.500326]        CPU0                    CPU1
[  872.500906]        ----                    ----
[  872.501464]   lock(&p->sequence);
[  872.501923]                                local_irq_disable();
[  872.502615]                                lock(&xa->xa_lock#4);
[  872.503327]                                lock(&p->sequence);
[  872.504116]   <Interrupt>
[  872.504513]     lock(&xa->xa_lock#4);


Which is introduced by commit 2841808f35ee for all file systems. 
The should be rather generic - I shouldn't be the only one seeing
it?

So this?

mm: fix HARDIRQ-safe -> HARDIRQ-unsafe lock order in __folio_end_writeback()

__wb_writeout_add() is called while holding xa_lock (HARDIRQ-safe),
but it eventually calls fprop_fraction_percpu() which acquires
p->sequence (HARDIRQ-unsafe via seqcount), creating a lock ordering
violation.

Call trace:
  __folio_end_writeback()
    xa_lock_irqsave(&mapping->i_pages)     <- HARDIRQ-safe
      __wb_writeout_add()
        wb_domain_writeout_add()
          __fprop_add_percpu_max()
            fprop_fraction_percpu()
              read_seqcount_begin(&p->sequence)  <- HARDIRQ-unsafe

Possible deadlock scenario:

       CPU0                    CPU1
       ----                    ----
  lock(p->sequence)
                               local_irq_disable()
                               lock(xa_lock)
                               lock(p->sequence)
  <hardirq>
    lock(xa_lock)

                   *** DEADLOCK ***

Fix by moving __wb_writeout_add() outside the xa_lock critical section.
It only accesses percpu counters and global writeback domain structures,
none of which require xa_lock protection.

Fixes: 2841808f35ee ("mm: remove BDI_CAP_WRITEBACK_ACCT")
Signed-off-by: Bernd Schubert <bschubert@ddn.com>

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index ccdeb0e84d39..ab83e3cbbf94 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2994,7 +2994,6 @@ bool __folio_end_writeback(struct folio *folio)
 
                wb = inode_to_wb(inode);
                wb_stat_mod(wb, WB_WRITEBACK, -nr);
-               __wb_writeout_add(wb, nr);
                if (!mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK)) {
                        wb_inode_writeback_end(wb);
                        if (mapping->host)
@@ -3002,6 +3001,7 @@ bool __folio_end_writeback(struct folio *folio)
                }
 
                xa_unlock_irqrestore(&mapping->i_pages, flags);
+               __wb_writeout_add(wb, nr);
        } else {
                ret = folio_xor_flags_has_waiters(folio, 1 << PG_writeback);
        }



Thanks,
Bernd

next      parent reply	other threads:[~2026-01-10 15:31 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <9b845a47-9aee-43dd-99bc-1a82bea00442@bsbernd.com>
2026-01-10 15:31 ` Bernd Schubert [this message]
2026-01-10 16:30   ` Matthew Wilcox
2026-01-10 20:24     ` Bernd Schubert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b7b72183-f9e1-4e58-b40f-45a267cc6831@bsbernd.com \
    --to=bernd@bsbernd.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@kernel.org \
    --cc=hbirthelmer@ddn.com \
    --cc=joannelkoong@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=miklos@szeredi.hu \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox