From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B15B4C3ABC3 for ; Wed, 14 May 2025 00:03:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5D97C6B0089; Tue, 13 May 2025 20:03:48 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 586B36B008A; Tue, 13 May 2025 20:03:48 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 427CB6B008C; Tue, 13 May 2025 20:03:48 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 205DA6B0089 for ; Tue, 13 May 2025 20:03:48 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 051E3161A17 for ; Wed, 14 May 2025 00:03:47 +0000 (UTC) X-FDA: 83439564936.04.66A7C8D Received: from mail-ed1-f51.google.com (mail-ed1-f51.google.com [209.85.208.51]) by imf06.hostedemail.com (Postfix) with ESMTP id E4D85180003 for ; Wed, 14 May 2025 00:03:45 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=YeoP45Cn; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf06.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.208.51 as permitted sender) smtp.mailfrom=mjguzik@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1747181026; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=6lqSoGgVBEs4e77J4dvCBv01iW/I9gHcChJC6DVoY/E=; b=KN3j46B2cyXQ7j93J80/Gi8djWkOW3ApFjYY++cQmZ+tM+ILuGKKMmFv2glAPtgm3k2rpp tG+Elo/1yWAVjtifHx90qnGjQIFq4z1y87hKj8HBe4p7AJV4ugQ6TPnhFqq9U0qQP7I4w8 ToxfNcFraOiNNxCGkxaQEX/oV4DiNIU= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1747181026; a=rsa-sha256; cv=none; b=x4Jg0xkbHgURfO1uUjqJC3QjjlFErxXrMeE+LwA3iau6a8DPlEaRj4uW5ZqrZawBB54WVZ wJgK6DVJLtHgk5edu5M62bdXlz7JlT0slV/sjip6MacqyS1ZLwF9UuwK7AyZfqIcuq59wb 97c3vk1dhvnt4qwKtrb0gsLQy0Z8xcY= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=YeoP45Cn; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf06.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.208.51 as permitted sender) smtp.mailfrom=mjguzik@gmail.com Received: by mail-ed1-f51.google.com with SMTP id 4fb4d7f45d1cf-5f6fb95f431so677758a12.0 for ; Tue, 13 May 2025 17:03:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1747181024; x=1747785824; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=6lqSoGgVBEs4e77J4dvCBv01iW/I9gHcChJC6DVoY/E=; b=YeoP45CnvUIq17qror3oO9Siobpva9laXG+lW3zZ+V/WF/bYwAwuaBeoe6L7xElEmj wigLY+xAGsnm9W7GGRV54ZQKCtamf/Hug8X7GEZXnXW+FWl1QYsppg6pXW4sqtrPtO0/ soButy+L1PGVrPJnVrY5gIA6IEna0SWXKX58nCE8xHOeN277EOoR6q36WAuLGIdAPOCv NvXCOzs7Y3vP1WwiNOup2YoTuuD4ZZzuVefa14lTGEm7QyqJEIwOOxwOeItxzxChPEUW z1b5j+sSTi63xRN4ftILv6sVhBup7xedjI1avaiyWK+QaRWSPGKW+OIyyWXLL2EBEXdq hSlQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747181024; x=1747785824; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=6lqSoGgVBEs4e77J4dvCBv01iW/I9gHcChJC6DVoY/E=; b=hB9axAbnR2VbhuOcsFKWr8xLr8QNHOp3TPH/9oXi3NtXg0+XIJOIiEURKTy8h6ewOo ZgRt+GauvWs8JmnwkAfvCMYstLEaft33aLfI5XGYGUS891aCaFwGp5r8BIGdvWaZJR19 1OMf9CYiZ1MFUDiV6Wp6ostSJJ9PLfmpDKbHIqrMr9Sws7BuRtP919mbvUoKNrfDCHHJ 96fH/CMEZ0K/Vma0MKqUOOf6/mNe5dKyU4VdfBDhL5oNT4lZRGN+/gl1PXDLzYxUgdvX Jn/q7P2AiKa/rznfZP1JCPmvAtd42JCov/9j9xM4x465HK/rooNd3tGmG/HJaOYNK0aO 1syQ== X-Forwarded-Encrypted: i=1; AJvYcCUPCk4xWqRLtRicD5aR1vufBX1pSDQIHx0J6BQfqWsn4HQku4pCQt7NVQbIh+ecdZqdxxUTaSkQWA==@kvack.org X-Gm-Message-State: AOJu0YyTqaL0R7FvaFgkkchupCsKv3Or/HdzFOLKn2EsFCdRamxRlmNV RWIjPLmd2UB+NSP1kf+kG9+/Hx5oeVCkUHSJP6tzgDiXA4tVjF7ovygFo4Zytyyu2URnmLynSP1 yaoHn60xochBv60QFFv4Gs5dIPIU= X-Gm-Gg: ASbGncse5Z5k4dASUXx4rrFPn1j2k/qFdkIcYn4d/6yNRniCopBOzOBX4iVyAkVGpLP kJMxAqKYmVgMfWn4bluxpJycJXe+qdT9jmpRaZl09yJYs5o6lsV4ul6h67ttK6KFUGTre7eCo2H T6aOwMSPQeMTq4amtMMIJCPSv5vqst6Jp91WlcO/yuwQ== X-Google-Smtp-Source: AGHT+IGRlSkJ0UH4AJ6sv5ErNW8p3B/IKPPQ+Oz8ZXhFjDVyTM4z7WXblF2mhP+ITGzr1p9iVxs23uJFZScuNAmdKVU= X-Received: by 2002:a17:907:1785:b0:ad2:cce:8d5e with SMTP id a640c23a62f3a-ad4f71dc930mr154641766b.7.1747181023932; Tue, 13 May 2025 17:03:43 -0700 (PDT) MIME-Version: 1.0 References: <20221006082735.1321612-1-keescook@chromium.org> <20221006082735.1321612-2-keescook@chromium.org> <20221006090506.paqjf537cox7lqrq@wittgenstein> <86CE201B-5632-4BB7-BCF6-7CB2C2895409@chromium.org> <871pss17hq.fsf@email.froward.int.ebiederm.org> In-Reply-To: <871pss17hq.fsf@email.froward.int.ebiederm.org> From: Mateusz Guzik Date: Wed, 14 May 2025 02:03:31 +0200 X-Gm-Features: AX0GCFst7wU9kVixjwdrqiox0NLTh4DbO3zQ5nACeNFmn77IHwc5Pm1sBQG733I Message-ID: Subject: Re: [PATCH 1/2] fs/exec: Explicitly unshare fs_struct on exec To: "Eric W. Biederman" Cc: Jann Horn , Kees Cook , Kees Cook , Christian Brauner , Jorge Merlino , Alexander Viro , Thomas Gleixner , Andy Lutomirski , Sebastian Andrzej Siewior , Andrew Morton , linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, John Johansen , Paul Moore , James Morris , "Serge E. Hallyn" , Stephen Smalley , Eric Paris , Richard Haines , Casey Schaufler , Xin Long , "David S. Miller" , Todd Kjos , Ondrej Mosnacek , Prashanth Prahlad , Micah Morton , Fenghua Yu , Andrei Vagin , linux-kernel@vger.kernel.org, apparmor@lists.ubuntu.com, linux-security-module@vger.kernel.org, selinux@vger.kernel.org, linux-hardening@vger.kernel.org, oleg@redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: dofmr9kj87pykeuxouukoiokmjx3pg6a X-Rspam-User: X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: E4D85180003 X-HE-Tag: 1747181025-385188 X-HE-Meta: U2FsdGVkX1/pswTBFdiaQej8rAkgcE9ZrVZVJnBn2YaFpK4wm13CZ0YtY90yNX+oPuC0FljNDBoJRhTOJFkWhWGVLSkvvsSjnz6ouYfyZ3adWK3K/EhtoZs5mRYho4P025pFXc8yr6beh15DTG2eixqXDNADVgIwaAitU8LPaa4OYXhnlFQtSz8iOY3hAHRj/pAKX3diYsf/k6KPQJ3j7W1n+mu4WNtiwc4QyUAL8Ot8clCvCqxiqSqHLOJzQNin8IquYDEibcbqr21GLA8AYNcws/zjAN4uu1bWUDP2NudhlJvRnwGmO7YcuaICEPylIIotpJ1DTgu1owzLsOPsPmidqwPIAZBdetHtjlRZyr7Xv6ZO8pa24EXFhlzdB6p+6ej9HYlaJiCjrm2JH3rWjNF8k9CYpRDOvYKAkWPci5rnuxeTV8Jcyn/v/sQ8Bo4H99GclZ+gmzQfhwUjaDIYKnhjnEd6Pz7Ikv6DOnKQMN62X20PnE6gqZritqQm5q1zkrS5SS2zsIEw8XFL6EnslFB+Jz4hiTlUUbN3jii99JJUDK9bzkd+72rbVH56e1Tyf4P4y6Qgd3JUR0+/RGJXkK/oNZZTJzJuOGWfertov+/OR/TOuG0Fv9DCqW2vAEjcPXdIlVfOxrB/MPnfKJLpEyTMpATi9Eg/1FSuTL9Ddx/6NbbEtn7+BjnJx2F46yAi15eHI2batOFe/PXPaNNS647d8m3l9W33IhfIHh3ST2fLMlcXZL5AqdEHbZWTWM3N7ijY5qgdaNNjGBKxBTJKWbyuESCoI0ujpHtym6VspPz8OsRQ3tCPMCf/btY8iWsDBvy+cR9rE6/pHeSQp5r+6xtcLqjj7BBMw4qNZZj0nQStBm20IyDXeC3Djym0eZU/Vf61evTQsMS9t3JCCnWoma7Jt+K/PrkR9LeJ36DP/UPLDh+Pl/32WmK/J1djNoL9dJNaakwOGxpcxA6EfG8 EhCszB2y DTjgmltylcs2wp4/+ELq1t8owNOpUwlctBnm3XThuP8nDQuzgei5ZaJeLRR9k+3mmLcFrgL+H090mcaf8Yp6Kuo2rD1F++hpp4lAODrY+0eFojTZsB6gFEpROR6V34UIxqzVTaMyQKyI0ROje+PFkD/RNgEpcmWWOGYGDSCdZTqrZwVQAikMqbmEvPtnStalFZEN2T4YePbehR6QF9UHcl93vstc/A0uTuu1+ILxih3HG7YttwSgzsE7s2d0968+WAennl9fi0xk0EdIWY5IS5VA2UABJtCiPRfUJpCyMhqbpzmp0xcOd02FaznJna8mljQWB/QkfoSU3j0l2vxHvF93eeWtAPZqBrRh011kUJ+pnLfDhb01MZVV1bjkvBcFM0hJVsz5v4vRAEQeZQ/fFv+J+INmo4QEl/il8vUjOfIX1Axn4YcFvqw6IP3co5d9QsIK9hGrUpdxKPHkFrxVhCInPEdUYqWeZPH3S23JT86OrUrywW7d83CSk+zmKWhqd2Q81n4hs8XXiof0AK8BgW2k9nQCWnMrk1Sa5ITzs1/wviLtsn0ABnZ8fYA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, May 14, 2025 at 12:17=E2=80=AFAM Eric W. Biederman wrote: > > Jann Horn writes: > > > On Tue, May 13, 2025 at 10:57=E2=80=AFPM Kees Cook wr= ote: > >> On May 13, 2025 6:05:45 AM PDT, Mateusz Guzik wrot= e: > >> >Here is my proposal: *deny* exec of suid/sgid binaries if fs_struct i= s > >> >shared. This will have to be checked for after the execing proc becom= es > >> >single-threaded ofc. > >> > >> Unfortunately the above Chrome helper is setuid and uses CLONE_FS. > > > > Chrome first launches a setuid helper, and then the setuid helper does > > CLONE_FS. Mateusz's proposal would not impact this usecase. > > > > Mateusz is proposing to block the case where a process first does > > CLONE_FS, and *then* one of the processes sharing the fs_struct does a > > setuid execve(). Linux already downgrades such an execve() to be > > non-setuid, which probably means anyone trying to do this will get > > hard-to-understand problems. Mateusz' proposal would just turn this > > hard-to-debug edgecase, which already doesn't really work, into a > > clean error; I think that is a nice improvement even just from the > > UAPI standpoint. > > > > If this change makes it possible to clean up the kernel code a bit, eve= n better. > > What has brought this to everyone's attention just now? This is > the second mention of this code path I have seen this week. > There is a syzkaller report concerning ->in_exec handling, for example: https://lore.kernel.org/all/67dc67f0.050a0220.25ae54.001f.GAE@google.com/#t > AKA: security/commoncap.c:cap_bprm_creds_from_file(...) > > ... > > /* Don't let someone trace a set[ug]id/setpcap binary with the re= vised > > * credentials unless they have the appropriate permit. > > * > > * In addition, if NO_NEW_PRIVS, then ensure we get no new privs. > > */ > > is_setid =3D __is_setuid(new, old) || __is_setgid(new, old); > > > > if ((is_setid || __cap_gained(permitted, new, old)) && > > ((bprm->unsafe & ~LSM_UNSAFE_PTRACE) || > > !ptracer_capable(current, new->user_ns))) { > > /* downgrade; they get no more than they had, and maybe l= ess */ > > if (!ns_capable(new->user_ns, CAP_SETUID) || > > (bprm->unsafe & LSM_UNSAFE_NO_NEW_PRIVS)) { > > new->euid =3D new->uid; > > new->egid =3D new->gid; > > } > > new->cap_permitted =3D cap_intersect(new->cap_permitted, > > old->cap_permitted); > > } > > The actual downgrade is because a ptrace'd executable also takes > this path. > > I have seen it argued rather forcefully that continuing rather than > simply failing seems better in the ptrace case. > > In general I think it can be said this policy is "safe". AKA we don't > let a shared fs struct confuse privileged applications. So nothing > to panic about. > > It looks like most of the lsm's also test bprm->unsafe. > > So I imagine someone could very carefully separate the non-ptrace case > from the ptrace case but *shrug*. > > Perhaps: > > if ((is_setid || __cap_gained(permitted, new_old)) && > ((bprm->unsafe & ~LSM_UNSAFE_PTRACE) || > !ptracer_capable(current, new->user_ns))) { > + if (!(bprm->unsafe & LSM_UNSAFE_PTRACE)) { > + return -EPERM; > + } > /* downgrade; they get no more than they had, and maybe l= ess */ > if (!ns_capable(new->user_ns, CAP_SETUID) || > (bprm->unsafe & LSM_UNSAFE_NO_NEW_PRIVS)) { > new->euid =3D new->uid; > new->egid =3D new->gid; > } > new->cap_permitted =3D cap_intersect(new->cap_permitted, > old->cap_permitted); > } > > If that is what you want that doesn't look to scary. I don't think > it simplifies anything about fs->in_exec. As fs->in_exec is set when > the processing calling exec is the only process that owns the fs_struct. > With fs->in_exec just being a flag that doesn't allow another thread > to call fork and start sharing the fs_struct during exec. > > *Shrug* > > I don't see why anyone would care. It is just a very silly corner case. Well I don't see how ptrace factors into any of this, apart from being a different case of ignoring suid/sgid. I can agree the suid/sgid situation vs CLONE_FS is a silly corner case, but one which needs to be handled for security reasons and which currently has weirdly convoluted code to do it. The intent behind my proposal is very much to get the crapper out of the way in a future-proof and simple manner. In check_unsafe_exec() you can find a nasty loop over threads in the group to find out if the fs struct is used by anyone outside of said group. Since fs struct users are not explicitly tracked and any of them can have different creds than the current thread, the kernel opts to ignore suid/sgid if there are extra users found (for security reasons). The loop depends on no new threads showing up as the list is being walked, to that end copy_fs() can transiently return an error if it spots ->in_exec. The >in_exec field is used as a boolean/flag, but parallel execs using the same fs struct from different thread groups don't look serialized. This is supposed to be fine as in this case ->in_exec is not getting set to begin with, but it gets unconditionally unset on all execs. And so on. It's all weird af. Initially I was thinking about serializing all execs using a given fs_struct to bypass majority of the fuckery, but that's some churn to add and it still leaves possible breakage down the road -- should this unsafe sharing detection ever become racing nobody will find out until the bad guys have their turn with it. While unconditional unsharing turns out to be a no-go because of chrome, one can still do postpone detection until after the caller is single-threaded. By that time, if this is only the that thread and fs_struct has ->users =3D=3D 1, we know there is nobody sharing the struct or racing to add a ref to it. This allows treating ->users as a regular refcount, removes the weird loop over threads and removes the (at best misleading) ->in_exec field. With this in place it becomes trivial to also *deny* suid/sgid exec instead of trying to placate it. If you are sharing fs and are execing a binary in the first place, things are a little fishy. But if you are execing a suid/sgid, the kernel has to ignore the bit so either you are doing something wrong or are trying to exploit a bug. In order to sort this crapper out, I think one can start with a runtime tunable and a once-per-boot printk stating it denied such an exec (and stating how to bring it back). To be removed some time after hitting LTS perhaps. --=20 Mateusz Guzik