Re: [syzbot] [fs?] [mm?] KCSAN: data-race in bprm_execve / copy_fs (4)

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Christian Brauner <brauner@kernel.org>
To: Kees Cook <kees@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>,
	jack@suse.cz,  linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	 syzkaller-bugs@googlegroups.com, viro@zeniv.linux.org.uk,
	 syzbot <syzbot+1c486d0b62032c82a968@syzkaller.appspotmail.com>
Subject: Re: [syzbot] [fs?] [mm?] KCSAN: data-race in bprm_execve / copy_fs (4)
Date: Fri, 21 Mar 2025 09:45:39 +0100	[thread overview]
Message-ID: <20250321-abdecken-infomaterial-2f373f8e3b3c@brauner> (raw)
In-Reply-To: <202503201225.92C5F5FB1@keescook>

On Thu, Mar 20, 2025 at 01:09:38PM -0700, Kees Cook wrote:
> Hey look another threaded exec bug. :|
> 
> On Thu, Mar 20, 2025 at 12:09:36PM -0700, syzbot wrote:
> > ==================================================================
> > BUG: KCSAN: data-race in bprm_execve / copy_fs
> > 
> > write to 0xffff8881044f8250 of 4 bytes by task 13692 on cpu 0:
> >  bprm_execve+0x748/0x9c0 fs/exec.c:1884
> 
> This is:
> 
>         current->fs->in_exec = 0;
> 
> And is part of the execve failure path:
> 
> out:
> 	...
>         if (bprm->point_of_no_return && !fatal_signal_pending(current))
>                 force_fatal_sig(SIGSEGV);
> 
>         sched_mm_cid_after_execve(current);
>         current->fs->in_exec = 0;
>         current->in_execve = 0;
> 
>         return retval;
> }
> 
> >  do_execveat_common+0x769/0x7e0 fs/exec.c:1966
> >  do_execveat fs/exec.c:2051 [inline]
> >  __do_sys_execveat fs/exec.c:2125 [inline]
> >  __se_sys_execveat fs/exec.c:2119 [inline]
> >  __x64_sys_execveat+0x75/0x90 fs/exec.c:2119
> >  x64_sys_call+0x291e/0x2dc0 arch/x86/include/generated/asm/syscalls_64.h:323
> >  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
> >  do_syscall_64+0xc9/0x1c0 arch/x86/entry/common.c:83
> >  entry_SYSCALL_64_after_hwframe+0x77/0x7f
> > 
> > read to 0xffff8881044f8250 of 4 bytes by task 13686 on cpu 1:
> >  copy_fs+0x95/0xf0 kernel/fork.c:1770
> 
> This is:
> 
>                 if (fs->in_exec) {
> 
> Which is under lock:
> 
>         struct fs_struct *fs = current->fs;
>         if (clone_flags & CLONE_FS) {
>                 /* tsk->fs is already what we want */
>                 spin_lock(&fs->lock);
>                 /* "users" and "in_exec" locked for check_unsafe_exec() * */
>                 if (fs->in_exec) {
>                         spin_unlock(&fs->lock);
>                         return -EAGAIN;
>                 }
>                 fs->users++;
>                 spin_unlock(&fs->lock);
> 
> 
> Does execve need to be taking this lock? The other thing touching it is
> check_unsafe_exec(), which takes the lock. It looks like the bprm_execve()
> lock was removed in commit 8c652f96d385 ("do_execve() must not clear
> fs->in_exec if it was set by another thread") which used the return
> value from check_unsafe_exec():
> 
>     When do_execve() succeeds, it is safe to clear ->in_exec unconditionally.
>     It can be set only if we don't share ->fs with another process, and since
>     we already killed all sub-threads either ->in_exec == 0 or we are the
>     only user of this ->fs.
> 
>     Also, we do not need fs->lock to clear fs->in_exec.
> 
> This logic was updated in commit 9e00cdb091b0 ("exec:check_unsafe_exec:
> kill the dead -EAGAIN and clear_in_exec logic"), which includes this
> rationale:
> 
>             2. "out_unmark:" in do_execve_common() is either called
>                under ->cred_guard_mutex, or after de_thread() which
>                kills other threads, so we can't race with sub-thread
>                which could set ->in_exec. And if ->fs is shared with
>                another process ->in_exec should be false anyway.
> 
> The de_thread() is part of the "point of no return" in exec_binprm(),
> called via exec_binprm(). But the bprm_execve() error path is reachable
> from many paths prior to the point of no return.
> 
> What I can imagine here is two failing execs racing a fork:
> 
> 	A start execve
> 	B fork with CLONE_FS
> 	C start execve, reach check_unsafe_exec(), set fs->in_exec
> 	A bprm_execve() failure, clear fs->in_exec
> 	B copy_fs() increment fs->users.
> 	C bprm_execve() failure, clear fs->in_exec
> 
> But I don't think this is a "real" flaw, though, since the locking is to
> protect a _successful_ execve from a fork (i.e. getting the user count
> right). A successful execve will de_thread, and I don't see any wrong
> counting of fs->users with regard to thread lifetime.
> 
> Did I miss something in the analysis? Should we perform locking anyway,
> or add data race annotations, or something else?

Afaict, the only way this data race can happen is if we jump to the
cleanup label and then reset current->fs->in_exec. If the execve was
successful there's no one to race us with CLONE_FS obviously because we
took down all other threads.

I think the logic in commit 9e00cdb091b0 ("exec:check_unsafe_exec: kill
the dead -EAGAIN and clear_in_exec logic") is sound.

This is a harmless data race that can only happen if the execve fails.
The worst that can happen is that a subthread does clone(CLONE_FS) and
gets a spurious error because it raced with the exec'ing subthread
resetting fs->in_exec. So I think all we need is:

diff --git a/fs/exec.c b/fs/exec.c
index 506cd411f4ac..177acaf196a9 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1881,7 +1881,13 @@ static int bprm_execve(struct linux_binprm *bprm)
                force_fatal_sig(SIGSEGV);

        sched_mm_cid_after_execve(current);
-       current->fs->in_exec = 0;
+       /*
+        * If this execve failed before de_thread() and another
+        * subthread is concurrently forking with CLONE_FS they race
+        * with us resetting current->fs->in_exec. This is fine,
+        * annotate it.
+        */
+       data_race(current->fs->in_exec = 1);
        current->in_execve = 0;

        return retval;

next prev parent reply	other threads:[~2025-03-21  8:46 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-20 19:09 syzbot
2025-03-20 20:09 ` Kees Cook
2025-03-21  1:44   ` Al Viro
2025-03-21  8:10     ` Kees Cook
2025-03-21  8:49       ` Christian Brauner
2025-03-21  8:45   ` Christian Brauner [this message]
2025-03-22  1:00     ` Al Viro
2025-03-22  6:26       ` Kees Cook
2025-03-22 10:15         ` Mateusz Guzik
2025-03-22 10:28           ` Christian Brauner
2025-03-22 10:23       ` Christian Brauner
2025-03-22 15:55       ` Oleg Nesterov
2025-03-22 18:50         ` Al Viro
2025-03-23 18:14           ` Oleg Nesterov
2025-03-23 20:57             ` Christian Brauner
2025-03-24 16:00 ` [PATCH] exec: fix the racy usage of fs_struct->in_exec Oleg Nesterov
2025-03-24 17:01   ` Mateusz Guzik
2025-03-24 18:27     ` Oleg Nesterov
2025-03-24 18:37       ` Oleg Nesterov
2025-03-24 22:24       ` Mateusz Guzik
2025-03-25 10:09         ` Oleg Nesterov
2025-03-25 11:01           ` Mateusz Guzik
2025-03-25 13:21             ` Oleg Nesterov
2025-03-25 13:30               ` Christian Brauner
2025-03-25 14:15                 ` Mateusz Guzik
2025-03-25 14:46                   ` Christian Brauner
2025-03-25 18:40                     ` Kees Cook
2025-04-29 15:49   ` Oleg Nesterov
2025-04-29 16:57     ` Kees Cook
2025-04-29 17:12     ` Mateusz Guzik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250321-abdecken-infomaterial-2f373f8e3b3c@brauner \
    --to=brauner@kernel.org \
    --cc=jack@suse.cz \
    --cc=kees@kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=oleg@redhat.com \
    --cc=syzbot+1c486d0b62032c82a968@syzkaller.appspotmail.com \
    --cc=syzkaller-bugs@googlegroups.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox