linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Max Kellermann <max.kellermann@ionos.com>
To: Mateusz Guzik <mjguzik@gmail.com>
Cc: linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	 Linux Memory Management List <linux-mm@kvack.org>,
	ceph-devel@vger.kernel.org
Subject: Re: Need advice with iput() deadlock during writeback
Date: Wed, 17 Sep 2025 11:20:33 +0200	[thread overview]
Message-ID: <CAKPOu+_B=0G-csXEw2OshD6ZJm0+Ex9dRNf6bHpVuQFgBB7-Zw@mail.gmail.com> (raw)
In-Reply-To: <CAGudoHEqNYWMqDiogc9Q_s9QMQHB6Rm_1dUzcC7B0GFBrqS=1g@mail.gmail.com>

On Wed, Sep 17, 2025 at 10:59 AM Mateusz Guzik <mjguzik@gmail.com> wrote:
> There happens to be a temporarily inactive discussion related to it, see:
> https://lore.kernel.org/linux-fsdevel/cover.1756222464.git.josef@toxicpanda.com/
>
> but also the followup:
> https://lore.kernel.org/linux-fsdevel/eeu47pjcaxkfol2o2bltigfjvrz6eecdjwtilnmnprqh7dhdn7@rqi35ya5ilmv/
>
> The patchset posted there retains inode_wait_for_writeback().

That is indeed a very interesting thread tackling a very similar
problem. I guess I can learn a bit from the discussion.

> > My idea was something like iput_safe() and that function would defer
> > the actual iput() call if the reference counter is 1 (i.e. the caller
> > is holding the last reference).
> >
>
> That's the same as my proposal.

The real difference (aside from naming) is that I wanted to change
only callers in unsafe contexts to the new function. But I guess most
people calling iput() are not aware of its dangers and if we look
closer, more existing bugs may be revealed.

For example, the Ceph bugs only occur under memory pressure (via
memcg) - only when the dcache happens to be flushed and the process
doing the writes had already exited, thus nobody else was still
holding a reference to the inode. These are rare circumstances for
normal people, but on our servers, that happens all the time.

> Note  that vast majority of real-world calls to iput already come with
> a count of 1, but it may be this is not true for ceph.

Not my experience - I traced iput() and found that this was very rare
- because the dcache is almost always holding a reference and inodes
are only ever evicted if the dcache decides to drop them.

> I suspect the best short-term fix is to implement ceph-private async
> iput with linkage coming from struct ceph_inode_info or whatever other
> struct applicable.

I had already started writing exactly this, very similar to your
sketch. That's what I'm going to finish now - and it will produce a
patch that will hopefully be appropriate for a stable backport. This
Ceph deadlock bug appears to affect all Linux versions.

>         if (likely(!in_interrupt() && !(task->flags & PF_KTHREAD))) {
>                 init_task_work(&ci->async_task_work, __ceph_iput_async);
>                 if (!task_work_add(task, &ci->async_task_work, TWA_RESUME))
>                         return;
>         }

This part isn't useful for inodes, is it? I suppose this code exists
in fput() only to guarantee that all file handles are really closed
before returning to userspace, right? And we don't need that for
inodes?

Thanks for your helpful advice, Mateusz!

Max


  reply	other threads:[~2025-09-17  9:20 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-17  8:07 Max Kellermann
2025-09-17  8:23 ` Mateusz Guzik
2025-09-17  8:38   ` Max Kellermann
2025-09-17  8:59     ` Mateusz Guzik
2025-09-17  9:20       ` Max Kellermann [this message]
2025-09-17  9:32         ` Mateusz Guzik
2025-09-17 12:48         ` Max Kellermann
2025-09-17 20:14       ` Al Viro
2025-09-17 20:19         ` Max Kellermann
2025-09-17 20:29           ` Al Viro
2025-09-17 20:32             ` Max Kellermann
2025-09-17 20:23         ` Mateusz Guzik
2025-09-17 20:34           ` Al Viro
2025-09-17 20:36             ` Max Kellermann
2025-09-17 21:10               ` Al Viro
2025-09-17 21:19                 ` Max Kellermann
2025-09-17 21:20                   ` Mateusz Guzik
2025-09-17 20:39             ` Mateusz Guzik
2025-09-17 21:02               ` Al Viro
2025-09-17 21:18                 ` Mateusz Guzik
2025-09-17 21:42                 ` Al Viro
2025-09-17 22:58                   ` Mateusz Guzik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAKPOu+_B=0G-csXEw2OshD6ZJm0+Ex9dRNf6bHpVuQFgBB7-Zw@mail.gmail.com' \
    --to=max.kellermann@ionos.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mjguzik@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox