From: Al Viro <viro@zeniv.linux.org.uk>
To: Samuel Wu <wusamuel@google.com>
Cc: Greg KH <gregkh@linuxfoundation.org>,
linux-fsdevel@vger.kernel.org, torvalds@linux-foundation.org,
brauner@kernel.org, jack@suse.cz, raven@themaw.net,
miklos@szeredi.hu, neil@brown.name, a.hindborg@kernel.org,
linux-mm@kvack.org, linux-efi@vger.kernel.org,
ocfs2-devel@lists.linux.dev, kees@kernel.org,
rostedt@goodmis.org, linux-usb@vger.kernel.org,
paul@paul-moore.com, casey@schaufler-ca.com,
linuxppc-dev@lists.ozlabs.org, john.johansen@canonical.com,
selinux@vger.kernel.org, borntraeger@linux.ibm.com,
bpf@vger.kernel.org, clm@meta.com,
android-kernel-team <android-kernel-team@google.com>
Subject: Re: [PATCH v4 00/54] tree-in-dcache stuff
Date: Fri, 30 Jan 2026 23:57:43 +0000 [thread overview]
Message-ID: <20260130235743.GW3183987@ZenIV> (raw)
In-Reply-To: <CAG2Kctoqja9R1bBzdEAV15_yt=sBGkcub6C2nGE6VHMJh13=FQ@mail.gmail.com>
On Fri, Jan 30, 2026 at 02:31:54PM -0800, Samuel Wu wrote:
> On Thu, Jan 29, 2026 at 11:02 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> > OK. Could you take a clone of mainline repository and in there run
> > ; git fetch git://git.kernel.org:/pub/scm/linux/kernel/git/viro/vfs.git for-wsamuel:for-wsamuel
> > then
> > ; git diff for-wsamuel e5bf5ee26663
> > to verify that for-wsamuel is identical to tree you've seen breakage on
> > ; git diff for-wsamuel-base 1544775687f0
> > to verify that for-wsamuel-base is the tree where the breakage did not reproduce
> > Then bisect from for-wsamuel-base to for-wsamuel.
> >
> > Basically, that's the offending commit split into steps; let's try to figure
> > out what causes the breakage with better resolution...
>
> Confirming that bisect points to this patch: 09e88dc22ea2 (serialize
> ffs_ep0_open() on ffs->mutex)
So we have something that does O_NDELAY opens of ep0 *and* does not retry on
EAGAIN?
How lovely... Could you slap
WARN_ON(ret == -EAGAIN);
right before that
if (ret < 0)
return ret;
in there and see which process is doing that? Regression is a regression,
odd userland or not, but I would like to see what is that userland actually
trying to do there.
*grumble*
IMO at that point we have two problems - one is how to avoid a revert of the
tail of tree-in-dcache series, another is how to deal with quite real
preexisting bugs in functionfs.
Another thing to try (not as a suggestion of a fix, just an attempt to figure
out how badly would the things break): in current mainline replace that
ffs_mutex_lock(&ffs->mutex, file->f_flags & O_NONBLOCK)
in ffs_ep0_open() with
ffs_mutex_lock(&ffs->mutex, false)
and see how badly do the things regress for userland. Again, I'm not saying
that this is a fix - just trying to get some sense of what's the userland
is doing.
FWIW, it might make sense to try a lighter serialization in ffs_ep0_open() -
taking it there is due to the following scenario (assuming 6.18 or earlier):
ffs->state is FFS_DEACTIVATED. ffs->opened is 0. Two threads attempt to
open ep0. Here's what happens prior to these patches:
static int ffs_ep0_open(struct inode *inode, struct file *file)
{
struct ffs_data *ffs = inode->i_private;
if (ffs->state == FFS_CLOSING)
return -EBUSY;
file->private_data = ffs;
ffs_data_opened(ffs);
with
static void ffs_data_opened(struct ffs_data *ffs)
{
refcount_inc(&ffs->ref);
if (atomic_add_return(1, &ffs->opened) == 1 &&
ffs->state == FFS_DEACTIVATED) {
ffs->state = FFS_CLOSING;
ffs_data_reset(ffs);
}
}
IOW, the sequence is
if (state == FFS_CLOSING)
return -EBUSY;
n = atomic_add_return(1, &opened);
if (n == 1 && state == FFS_DEACTIVATED) {
state = FFS_CLOSING;
ffs_data_reset();
See the race there? If the second open() comes between the
increment of ffs->opened and setting the state to FFS_CLOSING,
it will *not* fail with EBUSY - it will proceed to return to
userland, while the first sucker is crawling through the work
in ffs_data_reset()/ffs_data_clear()/ffs_epfiles_destroy().
What's more, there's nothing to stop that second opener from
calling write() on the descriptor it got. No exclusion there -
ffs->state = FFS_READ_DESCRIPTORS;
ffs->setup_state = FFS_NO_SETUP;
ffs->flags = 0;
in ffs_data_reset() is *not* serialized against ffs_ep0_write().
Get preempted right after setting ->state and that write()
will go just fine, only to be surprised when the first thread
regains CPU and continues modifying the contents of *ffs
under whatever the second thread is doing.
That code obviously relies upon that kind of shit being prevented
by that -EBUSY logics in ep0 open() and that logics is obviously
racy as it is. Note that other callers of ffs_data_reset() have
similar problem: ffs_func_set_alt(), for example has
if (ffs->state == FFS_DEACTIVATED) {
ffs->state = FFS_CLOSING;
INIT_WORK(&ffs->reset_work, ffs_reset_work);
schedule_work(&ffs->reset_work);
return -ENODEV;
}
again, with no exclusion. Lose CPU just after seeing FFS_DEACTIVATED,
then have another thread open() the sucker and start going through
ffs_data_reset(), only to have us regain CPU and schedule this for
execution:
static void ffs_reset_work(struct work_struct *work)
{
struct ffs_data *ffs = container_of(work,
struct ffs_data, reset_work);
ffs_data_reset(ffs);
}
IOW, stray ffs_data_reset() coming to surprise the opener who'd
just finished ffs_data_reset() during open(2) and proceeded to
write to the damn thing, etc.
That's obviously on the "how do we fix the preexisting bugs" side
of things, though - regression needs to be dealt with ASAP anyway.
next prev parent reply other threads:[~2026-01-30 23:56 UTC|newest]
Thread overview: 81+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-18 5:15 Al Viro
2025-11-18 5:15 ` [PATCH v4 01/54] fuse_ctl_add_conn(): fix nlink breakage in case of early failure Al Viro
2025-11-18 5:15 ` [PATCH v4 02/54] tracefs: fix a leak in eventfs_create_events_dir() Al Viro
2025-11-18 5:15 ` [PATCH v4 03/54] new helper: simple_remove_by_name() Al Viro
2025-11-18 5:15 ` [PATCH v4 04/54] new helper: simple_done_creating() Al Viro
2025-11-18 5:15 ` [PATCH v4 05/54] introduce a flag for explicitly marking persistently pinned dentries Al Viro
2025-11-18 5:15 ` [PATCH v4 06/54] primitives for maintaining persisitency Al Viro
2025-11-18 5:15 ` [PATCH v4 07/54] convert simple_{link,unlink,rmdir,rename,fill_super}() to new primitives Al Viro
2025-11-18 5:15 ` [PATCH v4 08/54] convert ramfs and tmpfs Al Viro
2025-11-18 5:15 ` [PATCH v4 09/54] procfs: make /self and /thread_self dentries persistent Al Viro
2025-11-18 5:15 ` [PATCH v4 10/54] configfs, securityfs: kill_litter_super() not needed Al Viro
2025-11-18 5:15 ` [PATCH v4 11/54] convert xenfs Al Viro
2025-11-18 5:15 ` [PATCH v4 12/54] convert smackfs Al Viro
2025-11-18 5:15 ` [PATCH v4 13/54] convert hugetlbfs Al Viro
2025-11-18 5:15 ` [PATCH v4 14/54] convert mqueue Al Viro
2025-11-18 5:15 ` [PATCH v4 15/54] convert bpf Al Viro
2025-11-18 5:15 ` [PATCH v4 16/54] convert dlmfs Al Viro
2025-11-18 5:15 ` [PATCH v4 17/54] convert fuse_ctl Al Viro
2025-11-18 5:15 ` [PATCH v4 18/54] convert pstore Al Viro
2025-11-18 5:15 ` [PATCH v4 19/54] convert tracefs Al Viro
2025-11-18 5:15 ` [PATCH v4 20/54] convert debugfs Al Viro
2025-11-18 5:15 ` [PATCH v4 21/54] debugfs: remove duplicate checks in callers of start_creating() Al Viro
2025-11-18 5:15 ` [PATCH v4 22/54] convert efivarfs Al Viro
2025-11-18 5:15 ` [PATCH v4 23/54] convert spufs Al Viro
2025-11-18 5:15 ` [PATCH v4 24/54] convert ibmasmfs Al Viro
2025-11-18 5:15 ` [PATCH v4 25/54] ibmasmfs: get rid of ibmasmfs_dir_ops Al Viro
2025-11-18 5:15 ` [PATCH v4 26/54] convert devpts Al Viro
2025-11-18 5:15 ` [PATCH v4 27/54] binderfs: use simple_start_creating() Al Viro
2025-11-18 5:15 ` [PATCH v4 28/54] binderfs_binder_ctl_create(): kill a bogus check Al Viro
2025-11-18 5:15 ` [PATCH v4 29/54] convert binderfs Al Viro
2025-11-18 5:15 ` [PATCH v4 30/54] autofs_{rmdir,unlink}: dentry->d_fsdata->dentry == dentry there Al Viro
2025-11-18 5:15 ` [PATCH v4 31/54] convert autofs Al Viro
2025-11-18 5:15 ` [PATCH v4 32/54] convert binfmt_misc Al Viro
2025-11-18 5:15 ` [PATCH v4 33/54] selinuxfs: don't stash the dentry of /policy_capabilities Al Viro
2025-11-18 5:15 ` [PATCH v4 34/54] selinuxfs: new helper for attaching files to tree Al Viro
2025-11-18 5:15 ` [PATCH v4 35/54] convert selinuxfs Al Viro
2025-11-18 5:15 ` [PATCH v4 36/54] functionfs: don't abuse ffs_data_closed() on fs shutdown Al Viro
2025-11-18 5:15 ` [PATCH v4 37/54] functionfs: don't bother with ffs->ref in ffs_data_{opened,closed}() Al Viro
2025-11-18 5:15 ` [PATCH v4 38/54] functionfs: need to cancel ->reset_work in ->kill_sb() Al Viro
2025-11-18 5:15 ` [PATCH v4 39/54] functionfs: fix the open/removal races Al Viro
2025-11-18 5:15 ` [PATCH v4 40/54] functionfs: switch to simple_remove_by_name() Al Viro
2025-11-18 5:15 ` [PATCH v4 41/54] convert functionfs Al Viro
2025-11-18 5:15 ` [PATCH v4 42/54] gadgetfs: switch to simple_remove_by_name() Al Viro
2025-11-18 5:15 ` [PATCH v4 43/54] convert gadgetfs Al Viro
2025-11-18 5:15 ` [PATCH v4 44/54] hypfs: don't pin dentries twice Al Viro
2025-11-18 5:15 ` [PATCH v4 45/54] hypfs: switch hypfs_create_str() to returning int Al Viro
2025-11-18 5:15 ` [PATCH v4 46/54] hypfs: swich hypfs_create_u64() " Al Viro
2025-11-18 5:15 ` [PATCH v4 47/54] convert hypfs Al Viro
2025-11-18 5:15 ` [PATCH v4 48/54] convert rpc_pipefs Al Viro
2025-11-18 5:15 ` [PATCH v4 49/54] convert nfsctl Al Viro
2025-11-18 5:15 ` [PATCH v4 50/54] convert rust_binderfs Al Viro
2025-11-18 5:16 ` [PATCH v4 51/54] get rid of kill_litter_super() Al Viro
2025-11-18 5:16 ` [PATCH v4 52/54] convert securityfs Al Viro
2025-11-18 5:16 ` [PATCH v4 53/54] kill securityfs_recursive_remove() Al Viro
2025-11-18 5:16 ` [PATCH v4 54/54] d_make_discardable(): warn if given a non-persistent dentry Al Viro
2026-01-27 0:56 ` [PATCH v4 00/54] tree-in-dcache stuff Samuel Wu
2026-01-27 7:42 ` Greg KH
2026-01-27 18:39 ` Linus Torvalds
2026-01-27 20:14 ` Al Viro
2026-01-28 8:53 ` Greg KH
2026-01-28 2:02 ` Samuel Wu
2026-01-28 4:59 ` Al Viro
2026-01-29 0:58 ` Samuel Wu
2026-01-29 3:23 ` Al Viro
2026-01-29 22:54 ` Al Viro
2026-01-30 1:16 ` Samuel Wu
2026-01-30 7:04 ` Al Viro
2026-01-30 22:31 ` Samuel Wu
2026-01-30 23:57 ` Al Viro [this message]
2026-01-31 0:14 ` Linus Torvalds
2026-01-31 1:08 ` Al Viro
2026-01-31 1:11 ` Linus Torvalds
2026-02-01 0:11 ` Al Viro
2026-01-31 0:59 ` Al Viro
2026-01-31 1:05 ` Samuel Wu
2026-01-31 1:18 ` Al Viro
2026-01-31 2:09 ` Samuel Wu
2026-01-31 2:43 ` Al Viro
2026-01-31 19:48 ` Samuel Wu
2026-01-31 14:58 ` Krishna Kurapati PSSNV
2026-01-31 20:02 ` Samuel Wu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260130235743.GW3183987@ZenIV \
--to=viro@zeniv.linux.org.uk \
--cc=a.hindborg@kernel.org \
--cc=android-kernel-team@google.com \
--cc=borntraeger@linux.ibm.com \
--cc=bpf@vger.kernel.org \
--cc=brauner@kernel.org \
--cc=casey@schaufler-ca.com \
--cc=clm@meta.com \
--cc=gregkh@linuxfoundation.org \
--cc=jack@suse.cz \
--cc=john.johansen@canonical.com \
--cc=kees@kernel.org \
--cc=linux-efi@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-usb@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=miklos@szeredi.hu \
--cc=neil@brown.name \
--cc=ocfs2-devel@lists.linux.dev \
--cc=paul@paul-moore.com \
--cc=raven@themaw.net \
--cc=rostedt@goodmis.org \
--cc=selinux@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=wusamuel@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox