From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 12550C35FF3 for ; Fri, 21 Mar 2025 08:46:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5FD1F280002; Fri, 21 Mar 2025 04:45:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5ABD3280001; Fri, 21 Mar 2025 04:45:58 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 472EA280002; Fri, 21 Mar 2025 04:45:58 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 29A59280001 for ; Fri, 21 Mar 2025 04:45:58 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 34F2D59CD1 for ; Fri, 21 Mar 2025 08:45:59 +0000 (UTC) X-FDA: 83244925638.26.6172EDC Received: from sea.source.kernel.org (sea.source.kernel.org [172.234.252.31]) by imf18.hostedemail.com (Postfix) with ESMTP id 5511F1C000E for ; Fri, 21 Mar 2025 08:45:57 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=ib9fjPVr; spf=pass (imf18.hostedemail.com: domain of brauner@kernel.org designates 172.234.252.31 as permitted sender) smtp.mailfrom=brauner@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1742546757; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=RnMeVRMbkgWk1cyF9bZzI961GobILcHtlMIaLXwlj9c=; b=2dNjfyqPh89CAWNNAxpUzJ11dg++W1Cdmmgzdm/v/yq0raza/gC0hI1p31pJJjigClw0ob lTNnEWnAUjE8fVfq6s3ZBM6fvHfSuNGW51napi9X8pDIV7WZRs0R0Oq3qB/yx6ulEGNv0w /lyaHu75PkMsYdt++svS5YLz1NgPmUY= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=ib9fjPVr; spf=pass (imf18.hostedemail.com: domain of brauner@kernel.org designates 172.234.252.31 as permitted sender) smtp.mailfrom=brauner@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1742546757; a=rsa-sha256; cv=none; b=lG6s4ZMjAO38wbFXifNwx11WmeJCKK+SdEpP9LGdcg+y/1ItFo3tJ/flyWWzHd48+qyyL2 zTP6T+6RVah6zBUP6CxbMpJvzZiOODOgzKFBe7wqJfKENxTaCiJN2vAZlhaG1fJ9u7vJdc fS4LH43CmAc+/5PkBG0O/oPf68LU9q8= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by sea.source.kernel.org (Postfix) with ESMTP id 949A44343B; Fri, 21 Mar 2025 08:45:55 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id BA967C4CEE3; Fri, 21 Mar 2025 08:45:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1742546755; bh=0ZJMZv78OUXA4YZqXXMtIlqDz6jl8IDBTlX/dQ5d9/g=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=ib9fjPVrIakwcP8a9139b0y2inR+7c4ztUo6gwVQfpYOquLU2p4nChdI0ielEDm8T Ls9dQAhF7uhV6Px5dk3rc2dqfbVNyxqDmCbzCRoXu6xHgTJRDgEaR9kJB78xg0v26g eLtz+y6bEcKV4QgG14ISJa9JV0/uR+btCetrnCMB9Pb9TFKTpmJiDjVir9JcTeknl/ 5lnVc6D02a5Vu9k8PZ6pf0ea5+jLx0nG9NT9EV8YOu7rC3g1nscF62pHr8CuBBp6IK 5ntJPf9z+UrHKZG3EabBmpsGT0aKVqS9+SbIFQnsrtZemh07zLBI2ldbM5o0UR/zX+ nX6ZSlQwbmCgA== Date: Fri, 21 Mar 2025 09:45:39 +0100 From: Christian Brauner To: Kees Cook Cc: Oleg Nesterov , jack@suse.cz, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, syzkaller-bugs@googlegroups.com, viro@zeniv.linux.org.uk, syzbot Subject: Re: [syzbot] [fs?] [mm?] KCSAN: data-race in bprm_execve / copy_fs (4) Message-ID: <20250321-abdecken-infomaterial-2f373f8e3b3c@brauner> References: <67dc67f0.050a0220.25ae54.001f.GAE@google.com> <202503201225.92C5F5FB1@keescook> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <202503201225.92C5F5FB1@keescook> X-Rspamd-Queue-Id: 5511F1C000E X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: pxngmzo1j8anrs3owso8456p8hkyd8hj X-HE-Tag: 1742546757-572408 X-HE-Meta: U2FsdGVkX1+DOstjzM9gQJ4TZBNjNPP0roAcBM+XkSEGqQlTmSkwxMp0RsTIe0N7E4M6v9EMAW5AmjvwOXZUiOVEE3K77DLvWQ7WEJ/RIOvwOf60bJNwh8udMRE8gW4JNI4pyg3Xh9X3+CjmRFAqtLhth6eNj2fo3tOPvkxVXdU1Jg/08thh1sm7kJLgFWoQjyYyP513oNdQsbr+CqORkymLUHttGpNJ3wV86zfQuvHqaSJ5xk4cn5+HIVCr/l19pgEGTY83vbqJy1oYLc9hwDRf6ZTtZf4TRwILiiVKJ9gY9Cwu2pvqhBQUikU2GptYpKz+4r55G+Gpa1qPJZK3oNIhCKfyrymmp7LCgWtoC404p0Vx5UQQ2L60PDJSU4y80LY9O/we0YmK41UEfn1BnCOmqSlv5e3xsvCfz9/fpePy+xHR+F7O2QniV64+1yeABbH5TnRYh9o3eTILwoWxMgCszlMYTSyDKwagjpbA9rMD/B7V8Iih3xnZhcu9hnYBgsWPKUQOW28XSx4bJi59CZPXguZmyIpdNuEom65vs2N+k610S7tQR8eqdPyqWvvrohqcbxZv/Y51oYiVL0tNMP5BnPCnqkwVJqE6I+p4XEHQ9p4CIZFSZDklLyL/RMozhSebpZKDGifaUU1PnM/SsP607vqdzzngChC5OgnYHgHTbYgp8Ic0MSLqwHSWvDnTML2HnghXPDyqbI4feu7QljyksAb1/F8zUU1NuEN6x44YdAJGWWrZ3DYiQJUt94Lp0BzH2TtYFqXx4jCC9V8XNeC1f7TdJtT/XBb4ZKF83wr7DNVW6e/ahjl9gKkPqNiaeXPtt9o0zrwHWXajscoEIlFbJBHeBeFkyYo7hh26dBAEEEYoLFOmZFkAqKZXMVumhE4T+rVDoRgP6aY6ONTRprFIS3RLfxxi0bfrrLVgSv24HWBWzIMuQQnBzHHxEHfhz8tyyV/3Xv+wfcd3j0K CfEJn0HS K98M0DbdsnTZf26o/MD4+m0dB3dku02xBVxjIQCogWtw8D5rOXIIUzzLOgY+ToyIUBwQ/aHeL0bLHo0sn2dFwNlFREH4qY1NfhHfkH5tFvBZxcGk3Q8buSJtUVUov2NUnIoO+G4UZM4TQMgHNdMxMeyLYm/VUdAYhD0+TsXAk5r9Sr/YfQmF+FKeZsax1qczF1ESlz0Bsmn2mAdhuzvVLNCfezGkjKvIApyMr/MG0HyzJtUCcaecr1noiW+Gfq7xO1Mq5FiO29833OKY58Q4RaxthfIosESnOnCToR9xx4drJvMeEBBzLFaU4ctc0iz12mVzaLZvVJLlD8oGm3977QBqLabGvIZaxh7AQ/BwB4eJKFfe0mO9D/bbOf6bFlDM5JIXwk3ykzK2pBQB1AlsrB3lPHaxC9GHzMKWcuBOzny1Zi09g3g7DNB31ADh7UMFf2VrA/c0SWXLbGOoB+j+JZSA2AExCgfp678DJZMRjWQjNRvY= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Mar 20, 2025 at 01:09:38PM -0700, Kees Cook wrote: > Hey look another threaded exec bug. :| > > On Thu, Mar 20, 2025 at 12:09:36PM -0700, syzbot wrote: > > ================================================================== > > BUG: KCSAN: data-race in bprm_execve / copy_fs > > > > write to 0xffff8881044f8250 of 4 bytes by task 13692 on cpu 0: > > bprm_execve+0x748/0x9c0 fs/exec.c:1884 > > This is: > > current->fs->in_exec = 0; > > And is part of the execve failure path: > > out: > ... > if (bprm->point_of_no_return && !fatal_signal_pending(current)) > force_fatal_sig(SIGSEGV); > > sched_mm_cid_after_execve(current); > current->fs->in_exec = 0; > current->in_execve = 0; > > return retval; > } > > > do_execveat_common+0x769/0x7e0 fs/exec.c:1966 > > do_execveat fs/exec.c:2051 [inline] > > __do_sys_execveat fs/exec.c:2125 [inline] > > __se_sys_execveat fs/exec.c:2119 [inline] > > __x64_sys_execveat+0x75/0x90 fs/exec.c:2119 > > x64_sys_call+0x291e/0x2dc0 arch/x86/include/generated/asm/syscalls_64.h:323 > > do_syscall_x64 arch/x86/entry/common.c:52 [inline] > > do_syscall_64+0xc9/0x1c0 arch/x86/entry/common.c:83 > > entry_SYSCALL_64_after_hwframe+0x77/0x7f > > > > read to 0xffff8881044f8250 of 4 bytes by task 13686 on cpu 1: > > copy_fs+0x95/0xf0 kernel/fork.c:1770 > > This is: > > if (fs->in_exec) { > > Which is under lock: > > struct fs_struct *fs = current->fs; > if (clone_flags & CLONE_FS) { > /* tsk->fs is already what we want */ > spin_lock(&fs->lock); > /* "users" and "in_exec" locked for check_unsafe_exec() * */ > if (fs->in_exec) { > spin_unlock(&fs->lock); > return -EAGAIN; > } > fs->users++; > spin_unlock(&fs->lock); > > > Does execve need to be taking this lock? The other thing touching it is > check_unsafe_exec(), which takes the lock. It looks like the bprm_execve() > lock was removed in commit 8c652f96d385 ("do_execve() must not clear > fs->in_exec if it was set by another thread") which used the return > value from check_unsafe_exec(): > > When do_execve() succeeds, it is safe to clear ->in_exec unconditionally. > It can be set only if we don't share ->fs with another process, and since > we already killed all sub-threads either ->in_exec == 0 or we are the > only user of this ->fs. > > Also, we do not need fs->lock to clear fs->in_exec. > > This logic was updated in commit 9e00cdb091b0 ("exec:check_unsafe_exec: > kill the dead -EAGAIN and clear_in_exec logic"), which includes this > rationale: > > 2. "out_unmark:" in do_execve_common() is either called > under ->cred_guard_mutex, or after de_thread() which > kills other threads, so we can't race with sub-thread > which could set ->in_exec. And if ->fs is shared with > another process ->in_exec should be false anyway. > > The de_thread() is part of the "point of no return" in exec_binprm(), > called via exec_binprm(). But the bprm_execve() error path is reachable > from many paths prior to the point of no return. > > What I can imagine here is two failing execs racing a fork: > > A start execve > B fork with CLONE_FS > C start execve, reach check_unsafe_exec(), set fs->in_exec > A bprm_execve() failure, clear fs->in_exec > B copy_fs() increment fs->users. > C bprm_execve() failure, clear fs->in_exec > > But I don't think this is a "real" flaw, though, since the locking is to > protect a _successful_ execve from a fork (i.e. getting the user count > right). A successful execve will de_thread, and I don't see any wrong > counting of fs->users with regard to thread lifetime. > > Did I miss something in the analysis? Should we perform locking anyway, > or add data race annotations, or something else? Afaict, the only way this data race can happen is if we jump to the cleanup label and then reset current->fs->in_exec. If the execve was successful there's no one to race us with CLONE_FS obviously because we took down all other threads. I think the logic in commit 9e00cdb091b0 ("exec:check_unsafe_exec: kill the dead -EAGAIN and clear_in_exec logic") is sound. This is a harmless data race that can only happen if the execve fails. The worst that can happen is that a subthread does clone(CLONE_FS) and gets a spurious error because it raced with the exec'ing subthread resetting fs->in_exec. So I think all we need is: diff --git a/fs/exec.c b/fs/exec.c index 506cd411f4ac..177acaf196a9 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1881,7 +1881,13 @@ static int bprm_execve(struct linux_binprm *bprm) force_fatal_sig(SIGSEGV); sched_mm_cid_after_execve(current); - current->fs->in_exec = 0; + /* + * If this execve failed before de_thread() and another + * subthread is concurrently forking with CLONE_FS they race + * with us resetting current->fs->in_exec. This is fine, + * annotate it. + */ + data_race(current->fs->in_exec = 1); current->in_execve = 0; return retval;