linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Need advice with iput() deadlock during writeback
@ 2025-09-17  8:07 Max Kellermann
  2025-09-17  8:23 ` Mateusz Guzik
  0 siblings, 1 reply; 22+ messages in thread
From: Max Kellermann @ 2025-09-17  8:07 UTC (permalink / raw)
  To: linux-fsdevel, Linux Memory Management List, ceph-devel

Hi,

I am currently hunting several deadlock bugs in the Ceph filesystem
that have been causing server downtimes repeatedly.

One of the deadlocks looks like this:

 INFO: task kworker/u777:6:1270802 blocked for more than 122 seconds.
       Not tainted 6.16.7-i1-es #773
 task:kworker/u777:6  state:D stack:0     pid:1270802 tgid:1270802
ppid:2      task_flags:0x4208060 flags:0x00004000
 Workqueue: writeback wb_workfn (flush-ceph-3)
 Call Trace:
  <TASK>
  __schedule+0x4ea/0x17d0
  schedule+0x1c/0xc0
  inode_wait_for_writeback+0x71/0xb0
  evict+0xcf/0x200
  ceph_put_wrbuffer_cap_refs+0xdd/0x220
  ceph_invalidate_folio+0x97/0xc0
  ceph_writepages_start+0x127b/0x14d0
  do_writepages+0xba/0x150
  __writeback_single_inode+0x34/0x290
  writeback_sb_inodes+0x203/0x470
  __writeback_inodes_wb+0x4c/0xe0
  wb_writeback+0x189/0x2b0
  wb_workfn+0x30b/0x3d0
  process_one_work+0x143/0x2b0

There's a writeback, and during that writeback, Ceph invokes iput()
releasing the last reference to that inode; iput() sees there's
pending writeback and waits for writeback to complete. But there's
nobody who will ever be able to finish writeback, because this is the
very thread that is supposed to finish writeback, so it's waiting for
itself.

It seems to me that iput() is a rather dangerous function because it
can easily block for a long time, and must never be called while
holding any lock. I wonder if all iput() callers are aware of this...

Anyway, I was wondering who is usually supposed to hold the inode
reference during writeback. If there is pending writeback, somebody
must still have a reference, or else the inode could have been evicted
before writeback even started - does that lead to UAF when writeback
actually happens?

One idea would be to postpone iput() calls to a workqueue to have it
in a different, safe context. Of course, that sounds overhead - and it
feels like a lousy kludge. There must be another way, a canonical
approach to avoiding this deadlock. I have a feeling that Ceph is
behaving weirdly, that Ceph is "holding it wrong".

I tried to trace ext4 writeback but found the inode reference counter
to be 1, the only reference being held by the dcache. But what if I
flush the dcache in the middle of writeback... I don't get it.

FS and MM experts - please help me understand how this is supposed to work.

Max


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2025-09-17 22:58 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-09-17  8:07 Need advice with iput() deadlock during writeback Max Kellermann
2025-09-17  8:23 ` Mateusz Guzik
2025-09-17  8:38   ` Max Kellermann
2025-09-17  8:59     ` Mateusz Guzik
2025-09-17  9:20       ` Max Kellermann
2025-09-17  9:32         ` Mateusz Guzik
2025-09-17 12:48         ` Max Kellermann
2025-09-17 20:14       ` Al Viro
2025-09-17 20:19         ` Max Kellermann
2025-09-17 20:29           ` Al Viro
2025-09-17 20:32             ` Max Kellermann
2025-09-17 20:23         ` Mateusz Guzik
2025-09-17 20:34           ` Al Viro
2025-09-17 20:36             ` Max Kellermann
2025-09-17 21:10               ` Al Viro
2025-09-17 21:19                 ` Max Kellermann
2025-09-17 21:20                   ` Mateusz Guzik
2025-09-17 20:39             ` Mateusz Guzik
2025-09-17 21:02               ` Al Viro
2025-09-17 21:18                 ` Mateusz Guzik
2025-09-17 21:42                 ` Al Viro
2025-09-17 22:58                   ` Mateusz Guzik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox