linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Mateusz Guzik <mjguzik@gmail.com>
To: Max Kellermann <max.kellermann@ionos.com>
Cc: linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	 Linux Memory Management List <linux-mm@kvack.org>,
	ceph-devel@vger.kernel.org
Subject: Re: Need advice with iput() deadlock during writeback
Date: Wed, 17 Sep 2025 10:23:03 +0200	[thread overview]
Message-ID: <4z3imll6zbzwqcyfl225xn3rc4mev6ppjnx5itmvznj2yormug@utk6twdablj3> (raw)
In-Reply-To: <CAKPOu+-QRTC_j15=Cc4YeU3TAcpQCrFWmBZcNxfnw1LndVzASg@mail.gmail.com>

On Wed, Sep 17, 2025 at 10:07:11AM +0200, Max Kellermann wrote:
> Hi,
> 
> I am currently hunting several deadlock bugs in the Ceph filesystem
> that have been causing server downtimes repeatedly.
> 
> One of the deadlocks looks like this:
> 
>  INFO: task kworker/u777:6:1270802 blocked for more than 122 seconds.
>        Not tainted 6.16.7-i1-es #773
>  task:kworker/u777:6  state:D stack:0     pid:1270802 tgid:1270802
> ppid:2      task_flags:0x4208060 flags:0x00004000
>  Workqueue: writeback wb_workfn (flush-ceph-3)
>  Call Trace:
>   <TASK>
>   __schedule+0x4ea/0x17d0
>   schedule+0x1c/0xc0
>   inode_wait_for_writeback+0x71/0xb0
>   evict+0xcf/0x200
>   ceph_put_wrbuffer_cap_refs+0xdd/0x220
>   ceph_invalidate_folio+0x97/0xc0
>   ceph_writepages_start+0x127b/0x14d0
>   do_writepages+0xba/0x150
>   __writeback_single_inode+0x34/0x290
>   writeback_sb_inodes+0x203/0x470
>   __writeback_inodes_wb+0x4c/0xe0
>   wb_writeback+0x189/0x2b0
>   wb_workfn+0x30b/0x3d0
>   process_one_work+0x143/0x2b0
> 
> There's a writeback, and during that writeback, Ceph invokes iput()
> releasing the last reference to that inode; iput() sees there's
> pending writeback and waits for writeback to complete. But there's
> nobody who will ever be able to finish writeback, because this is the
> very thread that is supposed to finish writeback, so it's waiting for
> itself.
> 

So that we are clear, this is a legally held ref by ceph and you are
legally releasing it? It's not that the code assumes there is a ref
because it came from writeback?

> Anyway, I was wondering who is usually supposed to hold the inode
> reference during writeback. If there is pending writeback, somebody
> must still have a reference, or else the inode could have been evicted
> before writeback even started - does that lead to UAF when writeback
> actually happens?
> 

One of the ways to stall inode teardown is to have writeback running. It
does not need a reference because inode_wait_for_writeback() explicitly
waits for it like in the very deadlock you encountered.

> One idea would be to postpone iput() calls to a workqueue to have it
> in a different, safe context. Of course, that sounds overhead - and it
> feels like a lousy kludge. There must be another way, a canonical
> approach to avoiding this deadlock. I have a feeling that Ceph is
> behaving weirdly, that Ceph is "holding it wrong".

Doing it *by default* is indeed a no-go.

I don't know what other filesystems are doing, I would consider iput()
from writeback to be a bug.

However, assuming that's not avoidable, iput_async() or whatever could
be added to sort this out in a similar way fput() is.

As a temporary bandaid iput() itself could check if I_SYNC is set and if
so roll with the iput_async() option.

I can cook something up later.


  reply	other threads:[~2025-09-17  8:23 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-17  8:07 Max Kellermann
2025-09-17  8:23 ` Mateusz Guzik [this message]
2025-09-17  8:38   ` Max Kellermann
2025-09-17  8:59     ` Mateusz Guzik
2025-09-17  9:20       ` Max Kellermann
2025-09-17  9:32         ` Mateusz Guzik
2025-09-17 12:48         ` Max Kellermann
2025-09-17 20:14       ` Al Viro
2025-09-17 20:19         ` Max Kellermann
2025-09-17 20:29           ` Al Viro
2025-09-17 20:32             ` Max Kellermann
2025-09-17 20:23         ` Mateusz Guzik
2025-09-17 20:34           ` Al Viro
2025-09-17 20:36             ` Max Kellermann
2025-09-17 21:10               ` Al Viro
2025-09-17 21:19                 ` Max Kellermann
2025-09-17 21:20                   ` Mateusz Guzik
2025-09-17 20:39             ` Mateusz Guzik
2025-09-17 21:02               ` Al Viro
2025-09-17 21:18                 ` Mateusz Guzik
2025-09-17 21:42                 ` Al Viro
2025-09-17 22:58                   ` Mateusz Guzik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4z3imll6zbzwqcyfl225xn3rc4mev6ppjnx5itmvznj2yormug@utk6twdablj3 \
    --to=mjguzik@gmail.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=max.kellermann@ionos.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox