From: Mateusz Guzik <mjguzik@gmail.com>
To: Max Kellermann <max.kellermann@ionos.com>
Cc: linux-fsdevel <linux-fsdevel@vger.kernel.org>,
Linux Memory Management List <linux-mm@kvack.org>,
ceph-devel@vger.kernel.org
Subject: Re: Need advice with iput() deadlock during writeback
Date: Wed, 17 Sep 2025 10:23:03 +0200 [thread overview]
Message-ID: <4z3imll6zbzwqcyfl225xn3rc4mev6ppjnx5itmvznj2yormug@utk6twdablj3> (raw)
In-Reply-To: <CAKPOu+-QRTC_j15=Cc4YeU3TAcpQCrFWmBZcNxfnw1LndVzASg@mail.gmail.com>
On Wed, Sep 17, 2025 at 10:07:11AM +0200, Max Kellermann wrote:
> Hi,
>
> I am currently hunting several deadlock bugs in the Ceph filesystem
> that have been causing server downtimes repeatedly.
>
> One of the deadlocks looks like this:
>
> INFO: task kworker/u777:6:1270802 blocked for more than 122 seconds.
> Not tainted 6.16.7-i1-es #773
> task:kworker/u777:6 state:D stack:0 pid:1270802 tgid:1270802
> ppid:2 task_flags:0x4208060 flags:0x00004000
> Workqueue: writeback wb_workfn (flush-ceph-3)
> Call Trace:
> <TASK>
> __schedule+0x4ea/0x17d0
> schedule+0x1c/0xc0
> inode_wait_for_writeback+0x71/0xb0
> evict+0xcf/0x200
> ceph_put_wrbuffer_cap_refs+0xdd/0x220
> ceph_invalidate_folio+0x97/0xc0
> ceph_writepages_start+0x127b/0x14d0
> do_writepages+0xba/0x150
> __writeback_single_inode+0x34/0x290
> writeback_sb_inodes+0x203/0x470
> __writeback_inodes_wb+0x4c/0xe0
> wb_writeback+0x189/0x2b0
> wb_workfn+0x30b/0x3d0
> process_one_work+0x143/0x2b0
>
> There's a writeback, and during that writeback, Ceph invokes iput()
> releasing the last reference to that inode; iput() sees there's
> pending writeback and waits for writeback to complete. But there's
> nobody who will ever be able to finish writeback, because this is the
> very thread that is supposed to finish writeback, so it's waiting for
> itself.
>
So that we are clear, this is a legally held ref by ceph and you are
legally releasing it? It's not that the code assumes there is a ref
because it came from writeback?
> Anyway, I was wondering who is usually supposed to hold the inode
> reference during writeback. If there is pending writeback, somebody
> must still have a reference, or else the inode could have been evicted
> before writeback even started - does that lead to UAF when writeback
> actually happens?
>
One of the ways to stall inode teardown is to have writeback running. It
does not need a reference because inode_wait_for_writeback() explicitly
waits for it like in the very deadlock you encountered.
> One idea would be to postpone iput() calls to a workqueue to have it
> in a different, safe context. Of course, that sounds overhead - and it
> feels like a lousy kludge. There must be another way, a canonical
> approach to avoiding this deadlock. I have a feeling that Ceph is
> behaving weirdly, that Ceph is "holding it wrong".
Doing it *by default* is indeed a no-go.
I don't know what other filesystems are doing, I would consider iput()
from writeback to be a bug.
However, assuming that's not avoidable, iput_async() or whatever could
be added to sort this out in a similar way fput() is.
As a temporary bandaid iput() itself could check if I_SYNC is set and if
so roll with the iput_async() option.
I can cook something up later.
next prev parent reply other threads:[~2025-09-17 8:23 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-17 8:07 Max Kellermann
2025-09-17 8:23 ` Mateusz Guzik [this message]
2025-09-17 8:38 ` Max Kellermann
2025-09-17 8:59 ` Mateusz Guzik
2025-09-17 9:20 ` Max Kellermann
2025-09-17 9:32 ` Mateusz Guzik
2025-09-17 12:48 ` Max Kellermann
2025-09-17 20:14 ` Al Viro
2025-09-17 20:19 ` Max Kellermann
2025-09-17 20:29 ` Al Viro
2025-09-17 20:32 ` Max Kellermann
2025-09-17 20:23 ` Mateusz Guzik
2025-09-17 20:34 ` Al Viro
2025-09-17 20:36 ` Max Kellermann
2025-09-17 21:10 ` Al Viro
2025-09-17 21:19 ` Max Kellermann
2025-09-17 21:20 ` Mateusz Guzik
2025-09-17 20:39 ` Mateusz Guzik
2025-09-17 21:02 ` Al Viro
2025-09-17 21:18 ` Mateusz Guzik
2025-09-17 21:42 ` Al Viro
2025-09-17 22:58 ` Mateusz Guzik
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4z3imll6zbzwqcyfl225xn3rc4mev6ppjnx5itmvznj2yormug@utk6twdablj3 \
--to=mjguzik@gmail.com \
--cc=ceph-devel@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=max.kellermann@ionos.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox