From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 42AB3C3ABD8 for ; Wed, 14 May 2025 15:42:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7012E6B0187; Wed, 14 May 2025 11:42:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6AFB86B0189; Wed, 14 May 2025 11:42:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5766F6B018A; Wed, 14 May 2025 11:42:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 37F9C6B0187 for ; Wed, 14 May 2025 11:42:27 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id A35521A0EDB for ; Wed, 14 May 2025 15:42:28 +0000 (UTC) X-FDA: 83441930376.29.E15D21B Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf02.hostedemail.com (Postfix) with ESMTP id 1109480007 for ; Wed, 14 May 2025 15:42:26 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=C7JHteJB; spf=pass (imf02.hostedemail.com: domain of kees@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=kees@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1747237347; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Y1TYj5vbgvcoMZr9iA4OJgVSfBE5GA/hIARRmAUceGY=; b=YOFCzGhyUzPC6FlNzuFYDwG1n3GNkZ+xoQ9r2UgDRrQRQNaVfedEiwR7gkXw5AmnYjBqff HCGkjZ083f70aQiIN5qc0aAz7ha/WnJBc+UPdybtWjU19dsk6zQfeR8Em2d5sjJQ0XFJG/ OH/QVObTbxStQm5yaQ+xiZxgT5fWUn4= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=C7JHteJB; spf=pass (imf02.hostedemail.com: domain of kees@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=kees@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1747237347; a=rsa-sha256; cv=none; b=2bLNy/Tgr7t3ZymQOfHkfaRDyKtajp/gK59ImA5dSsWdFGnnTaxGh6CjCVYjdO54DFX1lP iYs4dU7YIDNS8VdoogWQvJESsG+mF2L8SjBXdHUko/xlAhFK7D82qCOZPew+p+J4BIETo4 zmO+fJ3GLoEwbL8Ugm7NCf0xSqUv6wE= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id A84635C55D9; Wed, 14 May 2025 15:40:08 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 99FEEC4CEE3; Wed, 14 May 2025 15:42:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1747237345; bh=Q/a6cQKz1u3FABKvdOLs1BTFjrBS9xo3VS6yf8DUyaU=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=C7JHteJBR7doFhyrzMuEXW2wmLeyQXbDHsOzSSFmidGMsT2DcWkptVtQbsxDjHHYs Kjw600rDwGpLwIeRA5v2pRV27VJu7SPCqSk1RyTO0TEVlzlAGHMfuWUFZFdAZI0uQ3 TL8zZAjJ9pnksAZRWg7REJS3gBLFihieHfwrJraklAeAUsTXhh95FH4ApDm9Nd260P ml0QvWdSmKwesrXwvMNNAVIzAIpPee0/8P2pwfwfbIK0yhiWLYjqvjDopOQ5dUvQMP 3mnedQJDeqjw+XaNeWbKhoUUdmD0MPQjly5S3hf96VOYuEZy40BDq82gaeX85sF6zK KGXlreeK/N0ng== Date: Wed, 14 May 2025 08:42:22 -0700 From: Kees Cook To: Mateusz Guzik Cc: "Eric W. Biederman" , Jann Horn , Christian Brauner , Jorge Merlino , Alexander Viro , Thomas Gleixner , Andy Lutomirski , Sebastian Andrzej Siewior , Andrew Morton , linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, John Johansen , Paul Moore , James Morris , "Serge E. Hallyn" , Stephen Smalley , Eric Paris , Richard Haines , Casey Schaufler , Xin Long , "David S. Miller" , Todd Kjos , Ondrej Mosnacek , Prashanth Prahlad , Micah Morton , Fenghua Yu , Andrei Vagin , linux-kernel@vger.kernel.org, apparmor@lists.ubuntu.com, linux-security-module@vger.kernel.org, selinux@vger.kernel.org, linux-hardening@vger.kernel.org, oleg@redhat.com Subject: Re: [PATCH 1/2] fs/exec: Explicitly unshare fs_struct on exec Message-ID: <202505140822.6AB755B6@keescook> References: <20221006082735.1321612-1-keescook@chromium.org> <20221006082735.1321612-2-keescook@chromium.org> <20221006090506.paqjf537cox7lqrq@wittgenstein> <86CE201B-5632-4BB7-BCF6-7CB2C2895409@chromium.org> <871pss17hq.fsf@email.froward.int.ebiederm.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Rspamd-Server: rspam10 X-Stat-Signature: rmyk9ganp7r4q6kdd74c9t9exzh4tgqx X-Rspamd-Queue-Id: 1109480007 X-Rspam-User: X-HE-Tag: 1747237346-696941 X-HE-Meta: U2FsdGVkX1+bZqOr/hSymfAyIcnESDPTNQUyUkOoBYbdA+nDUoPJWrOT0CRkP5mvPXnspgxfKmPCVLxx3+QSiZ2rAUJ3b3bYaQuZCC4ljWhdEfMl82FO8sQ2N9WcN+Bw2rBfh2/InJB7sVYjWyoylor/BGEyAwpYQFAdJf5+mHNlhiuMgpIflo9kS7a7p1mkebr8EGsIecCFpoHzbrYx4jcZwrA/0ihX3NY6+ueWTVdWiGcC7E9J+xQUL/Ymovx1wt3TSMZrvqrVRIzF2eTq7onsBJlExhVAlxnDXvhqkY3u+tQkVnBulNVgeukr1eDEA2Sfblqw/OnA7936V9cklspRC3QY1X14BeopS+D/VwjN/bI4fwn1Qi0SrTVOhAz2VKCC8mKsjZw8JksqswNBDSXmiYcF/xHHmih1UbCzP/N0GxGcwKgfCx7wQ95WZ+rIVtG+9JupbXSl912uhdMYZ0ApJFj4UTedzF098lrBraZAynyK6eI/XE+WdwVzfqz76N26/tBJhfnduWSsp0DYlKr5Ja58swqBSk8cZxmkZW3A6AWzeO5hRReffqogdd4iSguTpdNAPhu/bE2JqLoNerm99D795uVp0HiKzrwv0wVil91f0xDEorOgCt5Jp6fc2gwD7VP34oy7zxgd6R5kx0TmEVgkvVBIsg03g8ZHOmMhQLvJikj1kcr3NmvCAHp5x9cs5+tlQeuG6Z+lgpfcL5heGuZZFZmUtc1kj7BIwWO+pKTBsuEnt6EgSyzvlNgAZSgkeQex4sUYPHy57eosJk74ug00gCKl14fI+KdwfBlUlFO7ElVBV6sajrlzuNxjaXf1mNDv61KHG1Vvkouv4F5sKEoilycbfGY+3ii/b0TqulGg1LPN2qTwA3egCqo8KwOFZzhn44AH6iMCtk02kv9gOE6sAr2vfgmFEpgt4f4wULT/PxYDSlBfRC7aYcxzDzWZi0CwWi9XxRD/xAT gn6SFoxg 27jSk7sGqbcVavOwxoO9vhNr1u3/wAdJOjhroChnlzFfb+TL3gdF7CUNvFwKv/m/tUfpU/WiPN+r2bLo= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, May 14, 2025 at 02:03:31AM +0200, Mateusz Guzik wrote: > On Wed, May 14, 2025 at 12:17 AM Eric W. Biederman > wrote: > > > > Jann Horn writes: > > > > > On Tue, May 13, 2025 at 10:57 PM Kees Cook wrote: > > >> On May 13, 2025 6:05:45 AM PDT, Mateusz Guzik wrote: > > >> >Here is my proposal: *deny* exec of suid/sgid binaries if fs_struct is > > >> >shared. This will have to be checked for after the execing proc becomes > > >> >single-threaded ofc. > > >> > > >> Unfortunately the above Chrome helper is setuid and uses CLONE_FS. > > > > > > Chrome first launches a setuid helper, and then the setuid helper does > > > CLONE_FS. Mateusz's proposal would not impact this usecase. > > > > > > Mateusz is proposing to block the case where a process first does > > > CLONE_FS, and *then* one of the processes sharing the fs_struct does a > > > setuid execve(). Linux already downgrades such an execve() to be > > > non-setuid, which probably means anyone trying to do this will get > > > hard-to-understand problems. Mateusz' proposal would just turn this > > > hard-to-debug edgecase, which already doesn't really work, into a > > > clean error; I think that is a nice improvement even just from the > > > UAPI standpoint. > > > > > > If this change makes it possible to clean up the kernel code a bit, even better. > > > > What has brought this to everyone's attention just now? This is > > the second mention of this code path I have seen this week. > > > > There is a syzkaller report concerning ->in_exec handling, for example: > https://lore.kernel.org/all/67dc67f0.050a0220.25ae54.001f.GAE@google.com/#t > > [...] > > It looks like most of the lsm's also test bprm->unsafe. > > > > So I imagine someone could very carefully separate the non-ptrace case > > from the ptrace case but *shrug*. > > > > Perhaps: > > > > if ((is_setid || __cap_gained(permitted, new_old)) && > > ((bprm->unsafe & ~LSM_UNSAFE_PTRACE) || > > !ptracer_capable(current, new->user_ns))) { > > + if (!(bprm->unsafe & LSM_UNSAFE_PTRACE)) { > > + return -EPERM; > > + } > > /* downgrade; they get no more than they had, and maybe less */ > > if (!ns_capable(new->user_ns, CAP_SETUID) || > > (bprm->unsafe & LSM_UNSAFE_NO_NEW_PRIVS)) { > > new->euid = new->uid; > > new->egid = new->gid; > > } > > new->cap_permitted = cap_intersect(new->cap_permitted, > > old->cap_permitted); > > } > > > > If that is what you want that doesn't look to scary. I don't think > > it simplifies anything about fs->in_exec. As fs->in_exec is set when > > the processing calling exec is the only process that owns the fs_struct. > > With fs->in_exec just being a flag that doesn't allow another thread > > to call fork and start sharing the fs_struct during exec. > > > > *Shrug* > > > > I don't see why anyone would care. It is just a very silly corner case. > > Well I don't see how ptrace factors into any of this, apart from being > a different case of ignoring suid/sgid. I actually think we might want to expand the above bit of logic to use an explicit tests of each LSM_UNSAFE case -- the merged logic is very difficult to read currently. Totally untested expansion, if I'm reading everything correctly: if (bprm->unsafe && (is_setid || __cap_gained(permitted, new_old))) { bool limit_caps = false; bool strip_eid = false; unsigned int unsafe = bprm->unsafe; /* Check each bit */ if (unsafe & LSM_UNSAFE_PTRACE) { if (!ptracer_capable(current, new->user_ns)) limit_caps = true; unsafe &= ~LSM_UNSAFE_PTRACE; } if (unsafe & LSM_UNSAFE_SHARE) { limit_caps = true; if (!ns_capable(new->user_ns, CAP_SETUID)) strip_eid = true; unsafe &= ~LSM_UNSAFE_SHARE; } if (unsafe & LSM_UNSAFE_NO_NEW_PRIVS) { limit_caps = true; if (!ns_capable(new->user_ns, CAP_SETUID)) strip_eid = true; unsafe &= ~LSM_UNSAFE_NO_NEW_PRIVS; } if (WARN(unsafe, "Unhandled LSM_UNSAFE flag: %u?!\n", unsafe)) return -EINVAL; if (limit_caps) { new->cap_permitted = cap_intersect(new->cap_permitted, old->cap_permitted); } if (strip_eid) { new->euid = new->uid; new->egid = new->gid; } } > I can agree the suid/sgid situation vs CLONE_FS is a silly corner > case, but one which needs to be handled for security reasons and which > currently has weirdly convoluted code to do it. > > The intent behind my proposal is very much to get the crapper out of > the way in a future-proof and simple manner. > > In check_unsafe_exec() you can find a nasty loop over threads in the > group to find out if the fs struct is used by anyone outside of said > group. Since fs struct users are not explicitly tracked and any of > them can have different creds than the current thread, the kernel opts > to ignore suid/sgid if there are extra users found (for security > reasons). The loop depends on no new threads showing up as the list is > being walked, to that end copy_fs() can transiently return an error if > it spots ->in_exec. > > The >in_exec field is used as a boolean/flag, but parallel execs using > the same fs struct from different thread groups don't look serialized. > This is supposed to be fine as in this case ->in_exec is not getting > set to begin with, but it gets unconditionally unset on all execs. > > And so on. It's all weird af. 100% :) -- Kees Cook