linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Mateusz Guzik <mjguzik@gmail.com>
To: Max Kellermann <max.kellermann@ionos.com>
Cc: linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	 Linux Memory Management List <linux-mm@kvack.org>,
	ceph-devel@vger.kernel.org
Subject: Re: Need advice with iput() deadlock during writeback
Date: Wed, 17 Sep 2025 10:59:23 +0200	[thread overview]
Message-ID: <CAGudoHEqNYWMqDiogc9Q_s9QMQHB6Rm_1dUzcC7B0GFBrqS=1g@mail.gmail.com> (raw)
In-Reply-To: <CAKPOu+--m8eppmF5+fofG=AKAMu5K_meF44UH4XiL8V3_X_rJg@mail.gmail.com>

On Wed, Sep 17, 2025 at 10:38 AM Max Kellermann
<max.kellermann@ionos.com> wrote:
>
> On Wed, Sep 17, 2025 at 10:23 AM Mateusz Guzik <mjguzik@gmail.com> wrote:
> > One of the ways to stall inode teardown is to have writeback running. It
> > does not need a reference because inode_wait_for_writeback() explicitly
> > waits for it like in the very deadlock you encountered.
>
> Ah, right. No UAF. But I wonder if that's the best way to do it - why
> not let writeback hold its own reference, and eliminate
> inode_wait_for_writeback()? (and its latency)
>

There happens to be a temporarily inactive discussion related to it, see:
https://lore.kernel.org/linux-fsdevel/cover.1756222464.git.josef@toxicpanda.com/

but also the followup:
https://lore.kernel.org/linux-fsdevel/eeu47pjcaxkfol2o2bltigfjvrz6eecdjwtilnmnprqh7dhdn7@rqi35ya5ilmv/

The patchset posted there retains inode_wait_for_writeback().

Suppose you are to get rid of it. In that case you have a corner case
where the writeback thread has to issue ->evict_inode() for arbitrary
filesystems, and that's quite a change and I'm not at all convinced
that's safe.

> > However, assuming that's not avoidable, iput_async() or whatever could
> > be added to sort this out in a similar way fput() is.
> >
> > As a temporary bandaid iput() itself could check if I_SYNC is set and if
> > so roll with the iput_async() option.
> >
> > I can cook something up later.
>
> My idea was something like iput_safe() and that function would defer
> the actual iput() call if the reference counter is 1 (i.e. the caller
> is holding the last reference).
>

That's the same as my proposal.

> Almost all iput() calls are fine because they just do an atomic
> decrement, but all kinds of scary stuff can happen if the last
> reference is released. Callers that are not confident that this is
> safe shall then use my new iput_safe() instead of iput().
>

Note  that vast majority of real-world calls to iput already come with
a count of 1, but it may be this is not true for ceph.

> I can write such a patch, but I wanted you experts to first confirm
> that this is a good idea that would be acceptable for merging (or
> maybe Ceph is just weird and there's a simpler way to avoid this).
>

So the problem here is where to put linkage for the delegated work.

Another issue is that as is nobody knows who set I_SYNC and that
probably should change on kernel with CONFIG_DEBUG_VFS. As luck would
have it I posted a related patchset here:
https://lore.kernel.org/linux-fsdevel/20250916135900.2170346-1-mjguzik@gmail.com/T/#t

with that in place and debug enabled we can panic early on the first
iput so you don't have to wait to trigger the problem

I suspect the best short-term fix is to implement ceph-private async
iput with linkage coming from struct ceph_inode_info or whatever other
struct applicable.

You can use __fput_deferred() as a reference (put intended). Note this
one assumes the obj is already unrefed, but for iput_async it would be
best to also postpone it to that routine.

A sketch, incomplete:
static DECLARE_DELAYED_WORK(delayed_ceph_iput_work, delayed_ceph_iput);

static void __ceph_iput_async(struct callback_head *work)
{
        struct ceph_inode_info *ci = container_of(work, struct
ceph_inode_info, async_task_work);
        iput(&ci->netfs.inode);
}

void ceph_iput_async(struct ceph_inode_info *ci)
{
        struct inode *inode = &ci->netfs.inode;

        if (atomic_add_unless(&inode->i_count, -1, 1))
                return;

        if (likely(!in_interrupt() && !(task->flags & PF_KTHREAD))) {
                init_task_work(&ci->async_task_work, __ceph_iput_async);
                if (!task_work_add(task, &ci->async_task_work, TWA_RESUME))
                        return;
        }

        if (llist_add(&ci->async_llist, &delayed_ceph_iput_list))
                schedule_delayed_work(&delayed_ceph_iput_work, 1);
}


  reply	other threads:[~2025-09-17  8:59 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-17  8:07 Max Kellermann
2025-09-17  8:23 ` Mateusz Guzik
2025-09-17  8:38   ` Max Kellermann
2025-09-17  8:59     ` Mateusz Guzik [this message]
2025-09-17  9:20       ` Max Kellermann
2025-09-17  9:32         ` Mateusz Guzik
2025-09-17 12:48         ` Max Kellermann
2025-09-17 20:14       ` Al Viro
2025-09-17 20:19         ` Max Kellermann
2025-09-17 20:29           ` Al Viro
2025-09-17 20:32             ` Max Kellermann
2025-09-17 20:23         ` Mateusz Guzik
2025-09-17 20:34           ` Al Viro
2025-09-17 20:36             ` Max Kellermann
2025-09-17 21:10               ` Al Viro
2025-09-17 21:19                 ` Max Kellermann
2025-09-17 21:20                   ` Mateusz Guzik
2025-09-17 20:39             ` Mateusz Guzik
2025-09-17 21:02               ` Al Viro
2025-09-17 21:18                 ` Mateusz Guzik
2025-09-17 21:42                 ` Al Viro
2025-09-17 22:58                   ` Mateusz Guzik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAGudoHEqNYWMqDiogc9Q_s9QMQHB6Rm_1dUzcC7B0GFBrqS=1g@mail.gmail.com' \
    --to=mjguzik@gmail.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=max.kellermann@ionos.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox