From: Christian Brauner <brauner@kernel.org>
To: Mateusz Guzik <mjguzik@gmail.com>
Cc: Kees Cook <kees@kernel.org>, Al Viro <viro@zeniv.linux.org.uk>,
Oleg Nesterov <oleg@redhat.com>,
jack@suse.cz, linux-fsdevel@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
syzkaller-bugs@googlegroups.com,
syzbot <syzbot+1c486d0b62032c82a968@syzkaller.appspotmail.com>
Subject: Re: [syzbot] [fs?] [mm?] KCSAN: data-race in bprm_execve / copy_fs (4)
Date: Sat, 22 Mar 2025 11:28:37 +0100 [thread overview]
Message-ID: <20250322-erahnen-popkultur-c871983bd560@brauner> (raw)
In-Reply-To: <lhxexfurwfcr4fgwxmnhcqeii2qrzpoy7dflpwqio463x6jhrm@rttainje5vzq>
On Sat, Mar 22, 2025 at 11:15:44AM +0100, Mateusz Guzik wrote:
> On Fri, Mar 21, 2025 at 11:26:03PM -0700, Kees Cook wrote:
> > On Sat, Mar 22, 2025 at 01:00:08AM +0000, Al Viro wrote:
> > > On Fri, Mar 21, 2025 at 09:45:39AM +0100, Christian Brauner wrote:
> > >
> > > > Afaict, the only way this data race can happen is if we jump to the
> > > > cleanup label and then reset current->fs->in_exec. If the execve was
> > > > successful there's no one to race us with CLONE_FS obviously because we
> > > > took down all other threads.
> > >
> > > Not really.
> >
> > Yeah, you found it. Thank you!
> >
> > > 1) A enters check_unsafe_execve(), sets ->in_exec to 1
> > > 2) B enters check_unsafe_execve(), sets ->in_exec to 1
> >
> > With 3 threads A, B, and C already running, fs->users == 3, so steps (1)
> > and (2) happily pass.
> >
> > > 3) A calls exec_binprm(), fails (bad binary)
> > > 4) A clears ->in_exec
> > > 5) C calls clone(2) with CLONE_FS and spawns D - ->in_exec is 0
> >
> > D's creation bumps fs->users == 4.
> >
> > > 6) B gets through exec_binprm(), kills A and C, but not D.
> > > 7) B clears ->in_exec, returns
> > >
> > > Result: B and D share ->fs, B runs suid binary.
> > >
> > > Had (5) happened prior to (2), (2) wouldn't have set ->in_exec;
> > > had (5) happened prior to (4), clone() would've failed; had
> > > (5) been delayed past (6), there wouldn't have been a thread
> > > to call clone().
> > >
> > > But in the window between (4) and (6), clone() doesn't see
> > > execve() in progress and check_unsafe_execve() has already
> > > been done, so it hadn't seen the extra thread.
> > >
> > > IOW, it really is racy. It's a counter, not a flag.
> >
> > Yeah, I would agree. Totally untested patch:
> >
> > diff --git a/fs/exec.c b/fs/exec.c
> > index 506cd411f4ac..988b8621c079 100644
> > --- a/fs/exec.c
> > +++ b/fs/exec.c
> > @@ -1632,7 +1632,7 @@ static void check_unsafe_exec(struct linux_binprm *bprm)
> > if (p->fs->users > n_fs)
> > bprm->unsafe |= LSM_UNSAFE_SHARE;
> > else
> > - p->fs->in_exec = 1;
> > + refcount_inc(&p->fs->in_exec);
> > spin_unlock(&p->fs->lock);
> > }
> >
> > @@ -1862,7 +1862,7 @@ static int bprm_execve(struct linux_binprm *bprm)
> >
> > sched_mm_cid_after_execve(current);
> > /* execve succeeded */
> > - current->fs->in_exec = 0;
> > + refcount_dec(¤t->fs->in_exec);
> > current->in_execve = 0;
> > rseq_execve(current);
> > user_events_execve(current);
> > @@ -1881,7 +1881,7 @@ static int bprm_execve(struct linux_binprm *bprm)
> > force_fatal_sig(SIGSEGV);
> >
> > sched_mm_cid_after_execve(current);
> > - current->fs->in_exec = 0;
> > + refcount_dec(¤t->fs->in_exec);
> > current->in_execve = 0;
> >
> > return retval;
>
> The bump is conditional and with this patch you may be issuing
> refcount_dec when you declined to refcount_inc.
>
> A special case where there are others to worry about and which proceeds
> with an exec without leaving in any indicators is imo sketchy.
>
> I would argue it would make the most sense to serialize these execs.
Yeah, I tend to agree.
And fwiw, I had proposed somewhere else already that we should start
restricting thread-group exec to the thread-group leader because
subthread exec is about as ugly as it can get.
I'd like to add a sysctl for this with the goal of removing this
completely in the future. I think there are very few if any legitimate
cases for subthread aka non-thread-group leader exec. It not just
complicates things it also means two task switch TIDs which is super
confusing from userspace (It also complicates pidfd polling but that's
an aside.). I'll wip up a patch for this once I get back from travel.
>
> Vast majority of programs are single-threaded when they exec with an
> unshared ->fs, so they don't need to bear any overhead nor complexity
> modulo a branch.
>
> For any fucky case you can park yourself waiting for any pending exec to
> finish.
next prev parent reply other threads:[~2025-03-22 10:28 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-20 19:09 syzbot
2025-03-20 20:09 ` Kees Cook
2025-03-21 1:44 ` Al Viro
2025-03-21 8:10 ` Kees Cook
2025-03-21 8:49 ` Christian Brauner
2025-03-21 8:45 ` Christian Brauner
2025-03-22 1:00 ` Al Viro
2025-03-22 6:26 ` Kees Cook
2025-03-22 10:15 ` Mateusz Guzik
2025-03-22 10:28 ` Christian Brauner [this message]
2025-03-22 10:23 ` Christian Brauner
2025-03-22 15:55 ` Oleg Nesterov
2025-03-22 18:50 ` Al Viro
2025-03-23 18:14 ` Oleg Nesterov
2025-03-23 20:57 ` Christian Brauner
2025-03-24 16:00 ` [PATCH] exec: fix the racy usage of fs_struct->in_exec Oleg Nesterov
2025-03-24 17:01 ` Mateusz Guzik
2025-03-24 18:27 ` Oleg Nesterov
2025-03-24 18:37 ` Oleg Nesterov
2025-03-24 22:24 ` Mateusz Guzik
2025-03-25 10:09 ` Oleg Nesterov
2025-03-25 11:01 ` Mateusz Guzik
2025-03-25 13:21 ` Oleg Nesterov
2025-03-25 13:30 ` Christian Brauner
2025-03-25 14:15 ` Mateusz Guzik
2025-03-25 14:46 ` Christian Brauner
2025-03-25 18:40 ` Kees Cook
2025-04-29 15:49 ` Oleg Nesterov
2025-04-29 16:57 ` Kees Cook
2025-04-29 17:12 ` Mateusz Guzik
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250322-erahnen-popkultur-c871983bd560@brauner \
--to=brauner@kernel.org \
--cc=jack@suse.cz \
--cc=kees@kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mjguzik@gmail.com \
--cc=oleg@redhat.com \
--cc=syzbot+1c486d0b62032c82a968@syzkaller.appspotmail.com \
--cc=syzkaller-bugs@googlegroups.com \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox