From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 982A8C35FFC for ; Sat, 22 Mar 2025 10:28:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 705C7280003; Sat, 22 Mar 2025 06:28:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 68BE9280001; Sat, 22 Mar 2025 06:28:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 52DFA280003; Sat, 22 Mar 2025 06:28:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 29BD3280001 for ; Sat, 22 Mar 2025 06:28:44 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 309101CC83F for ; Sat, 22 Mar 2025 10:28:45 +0000 (UTC) X-FDA: 83248813410.12.17E9CB5 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf02.hostedemail.com (Postfix) with ESMTP id 8463580005 for ; Sat, 22 Mar 2025 10:28:43 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=vH87r9u6; spf=pass (imf02.hostedemail.com: domain of brauner@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=brauner@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1742639323; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=HA7DEVx5myfMEuGEtRBOfQoRu4oNY21XTq+U33c75eI=; b=ZVxE8MJ3T2g6ziZjcWQAEikNvHoEckBwxgVj/fyvD5R6x2gMnYlfTao09OivBXMiB7WAP1 nxBnYEqEQErYqgSFf0uGRef1ffmnI2yTcatiIrxCbGAGW+TG1z2iCysCIYNTSN42LXKnqg VEswTTlYnZszKkgE64rY3In/5XZKEOo= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=vH87r9u6; spf=pass (imf02.hostedemail.com: domain of brauner@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=brauner@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1742639323; a=rsa-sha256; cv=none; b=OSFJZoubClsYS1JrjVWTNzbnfcrYqPavHszuYE6Dn9ZCb2IgMHMoq6g7Kx/uCZHgzG8LzJ RSWQFfUr3heeEh4Z5KNZaWhAixNl08QalRywAgy5gJ3R5VwprwhnoMsdbao8RHTNFoRaFv MMpASONSHxuO0MAVHYR7O5mBBe7vE+s= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id 72F695C5788; Sat, 22 Mar 2025 10:26:25 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id C785EC4CEDD; Sat, 22 Mar 2025 10:28:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1742639322; bh=/Kqgjm/YTOF5HTAfqwvoVIWHqlsHCxVeqZvK1/hHahw=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=vH87r9u6TDBaV0s5CeeV5jLdWlxe57mSoJCzif8WS6Dnt+5Z6qyQGrm3GAnCObMlQ 26ST0eVp5hihWU605LdcPbeGOVGRPiWfsbZSkd1Q+4RVuIG+9F0iUbvERes9S7mu1H mJE78JFUagXnYVBs1J7JqD4fniXmJwmmv4k1YvNb6wu4OPYwBokbLgEVv+zFCosjnX ABV4AeY1uAf5odch+JTiGZSTrzvioebwRZJSYU3f6m/fa8Bo7VvtH8qBBQmflpUigW R6w8Kas6wjF4zBbe/nXM/kLqoa3ndrZ/DwPa05uvietpOv9Dzm13Ek6UQyQgDyLMgE OOiU4S771rATQ== Date: Sat, 22 Mar 2025 11:28:37 +0100 From: Christian Brauner To: Mateusz Guzik Cc: Kees Cook , Al Viro , Oleg Nesterov , jack@suse.cz, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, syzkaller-bugs@googlegroups.com, syzbot Subject: Re: [syzbot] [fs?] [mm?] KCSAN: data-race in bprm_execve / copy_fs (4) Message-ID: <20250322-erahnen-popkultur-c871983bd560@brauner> References: <67dc67f0.050a0220.25ae54.001f.GAE@google.com> <202503201225.92C5F5FB1@keescook> <20250321-abdecken-infomaterial-2f373f8e3b3c@brauner> <20250322010008.GG2023217@ZenIV> <202503212313.1E55652@keescook> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: X-Rspamd-Queue-Id: 8463580005 X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: exgyofi51xgwyjiac1o743w34yrcxcrm X-HE-Tag: 1742639323-665196 X-HE-Meta: U2FsdGVkX1+NjbUOdOrDlyerPxL/1izC7AmPv3MWE1gcPfhr9OujVpviVpLV6UYoE6/uqmhBD70GSyaiYR9IwKpdi5Hu1bgiZpWbOYV3jG6WkcOy9ibMBVtqHl81LWPxziYWXS79fk3a1oXRpdaeK/KwVoaGbzaOGcM8Cumpqm8POXUCdUxchkR2/l1dIcGd7pnCvTALTEmfuODNZCGQ7IcCQPjXOTDIY+2UGVcUR9GMGP30WJaXv/tcX+q/YyrP986y13lC2tUB7B9RmxaeoeLr2OFHfy8Eb+kYZkuTvZfqC7U7Ng7TSxEFaD3L4TLJc7usQnTeQf00GqnRM7f/jP9+2UCtJITxLE/pSb46ZL3v8y8QCsLMyn0FUPrn1+jgcs6heGKGKWP9m+xsvOwV+dqWk8ZkPhnvcjUQv2rmg+AMceuB5nyNZslXmL8eXDZeb/oWjf8ZV8UDxcpiMvfukXSFN2cwGk80jojCKjU369iuDg+uIc3uq7OjhoIpMGfwmt9aaIK36D5U0M/aflclDe7FUqra6wNIaxcnWQcHoeYiJEiKiohvUewXVowkK3lgRR1wjhvQ+RQi0sZrpVAyVMylLFdSZ1yuC6GMWn9jhou1SN91Q3yZEO66nEEb0mS1/6/IfkRr2qG5XHOQ9oZl75k9MovPv6JkZn7963H2aYEo8tvF+95/OxLUguNeluR00ylWYx0hitfNjv4rPt1TcDyAY/WGUEk6J491/Tvsd4xrUDUKqRnR4Cp+bqAuHUV8ItAVbuk6eGVZKJwlGEc3/y6bEs2KJMz5V3HUEGK6IO0u26tW4KXVO+Sc+gLgE1grbjBCHYs/0xkuu6lPuyV7BjCehRZYbVDXbE5DGRxvohAl3D+Ggazg9JV7qG+Pp7AIRuJMRZJoiKU1rYBgSXnaieRFqDvy/7kowuej3dcQldqr1Tm1l6XNPC2Vn0E7N9Y2vQd2ht2oUDhLJ60GJ3R S6CamsgJ DXygYNsTXfg3KRLpINltVJgLN4xGGzbYKQBLnb+Jw/kGypCWYoUSIjhfgDL5wQFziHWm3hMpYEXYhuFUW+lBfucebd26yLm6QR5Xn0XkFxITDuC6F29A2Q7RsJevUyI9B7tgpQDlJhxOtoLTgaUVKDVjYjILTUALSB64LTfeTxBT3WPNbZ/gehMdY2EBVtRFCIR7bLEHMRko8edopsX7IMRtVk4UdsZr5J4QVSfb/gx8PZb+M8pTlZsFbxiU1/ev7H22EktGVA+w3iiCNturH+xbY8eEGuj99Bk7goDbqHbU2KGgddz+yInZtIiQl1PnwU1U3vU/4UY22+BqppGnwA1fXV4E4toOJQ3Gx3DULlqwEv7PhkeLtoQfW2PIsj19fEi565fzXypc3hbSPDQVxVcmV+7Mkb3c5Zh/visXFiIERv8q0n+LaoNxVXXVFfcj8aD+b2Atp6e9cbVMb872FI6PxWt9v/vfG1EaMsd+HvwsX9+g= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Sat, Mar 22, 2025 at 11:15:44AM +0100, Mateusz Guzik wrote: > On Fri, Mar 21, 2025 at 11:26:03PM -0700, Kees Cook wrote: > > On Sat, Mar 22, 2025 at 01:00:08AM +0000, Al Viro wrote: > > > On Fri, Mar 21, 2025 at 09:45:39AM +0100, Christian Brauner wrote: > > > > > > > Afaict, the only way this data race can happen is if we jump to the > > > > cleanup label and then reset current->fs->in_exec. If the execve was > > > > successful there's no one to race us with CLONE_FS obviously because we > > > > took down all other threads. > > > > > > Not really. > > > > Yeah, you found it. Thank you! > > > > > 1) A enters check_unsafe_execve(), sets ->in_exec to 1 > > > 2) B enters check_unsafe_execve(), sets ->in_exec to 1 > > > > With 3 threads A, B, and C already running, fs->users == 3, so steps (1) > > and (2) happily pass. > > > > > 3) A calls exec_binprm(), fails (bad binary) > > > 4) A clears ->in_exec > > > 5) C calls clone(2) with CLONE_FS and spawns D - ->in_exec is 0 > > > > D's creation bumps fs->users == 4. > > > > > 6) B gets through exec_binprm(), kills A and C, but not D. > > > 7) B clears ->in_exec, returns > > > > > > Result: B and D share ->fs, B runs suid binary. > > > > > > Had (5) happened prior to (2), (2) wouldn't have set ->in_exec; > > > had (5) happened prior to (4), clone() would've failed; had > > > (5) been delayed past (6), there wouldn't have been a thread > > > to call clone(). > > > > > > But in the window between (4) and (6), clone() doesn't see > > > execve() in progress and check_unsafe_execve() has already > > > been done, so it hadn't seen the extra thread. > > > > > > IOW, it really is racy. It's a counter, not a flag. > > > > Yeah, I would agree. Totally untested patch: > > > > diff --git a/fs/exec.c b/fs/exec.c > > index 506cd411f4ac..988b8621c079 100644 > > --- a/fs/exec.c > > +++ b/fs/exec.c > > @@ -1632,7 +1632,7 @@ static void check_unsafe_exec(struct linux_binprm *bprm) > > if (p->fs->users > n_fs) > > bprm->unsafe |= LSM_UNSAFE_SHARE; > > else > > - p->fs->in_exec = 1; > > + refcount_inc(&p->fs->in_exec); > > spin_unlock(&p->fs->lock); > > } > > > > @@ -1862,7 +1862,7 @@ static int bprm_execve(struct linux_binprm *bprm) > > > > sched_mm_cid_after_execve(current); > > /* execve succeeded */ > > - current->fs->in_exec = 0; > > + refcount_dec(¤t->fs->in_exec); > > current->in_execve = 0; > > rseq_execve(current); > > user_events_execve(current); > > @@ -1881,7 +1881,7 @@ static int bprm_execve(struct linux_binprm *bprm) > > force_fatal_sig(SIGSEGV); > > > > sched_mm_cid_after_execve(current); > > - current->fs->in_exec = 0; > > + refcount_dec(¤t->fs->in_exec); > > current->in_execve = 0; > > > > return retval; > > The bump is conditional and with this patch you may be issuing > refcount_dec when you declined to refcount_inc. > > A special case where there are others to worry about and which proceeds > with an exec without leaving in any indicators is imo sketchy. > > I would argue it would make the most sense to serialize these execs. Yeah, I tend to agree. And fwiw, I had proposed somewhere else already that we should start restricting thread-group exec to the thread-group leader because subthread exec is about as ugly as it can get. I'd like to add a sysctl for this with the goal of removing this completely in the future. I think there are very few if any legitimate cases for subthread aka non-thread-group leader exec. It not just complicates things it also means two task switch TIDs which is super confusing from userspace (It also complicates pidfd polling but that's an aside.). I'll wip up a patch for this once I get back from travel. > > Vast majority of programs are single-threaded when they exec with an > unshared ->fs, so they don't need to bear any overhead nor complexity > modulo a branch. > > For any fucky case you can park yourself waiting for any pending exec to > finish.