From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 10ED4EE49A0 for ; Sat, 19 Aug 2023 02:51:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EBA01940012; Fri, 18 Aug 2023 22:51:02 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E430C900004; Fri, 18 Aug 2023 22:51:02 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CE376940012; Fri, 18 Aug 2023 22:51:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id BAB61900004 for ; Fri, 18 Aug 2023 22:51:02 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 6D3EBA03AE for ; Sat, 19 Aug 2023 02:51:02 +0000 (UTC) X-FDA: 81139327164.10.DA7B64D Received: from mout-p-102.mailbox.org (mout-p-102.mailbox.org [80.241.56.152]) by imf02.hostedemail.com (Postfix) with ESMTP id 64BC480014 for ; Sat, 19 Aug 2023 02:51:00 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=cyphar.com header.s=MBO0001 header.b=KaQmHEg9; spf=pass (imf02.hostedemail.com: domain of cyphar@cyphar.com designates 80.241.56.152 as permitted sender) smtp.mailfrom=cyphar@cyphar.com; dmarc=pass (policy=reject) header.from=cyphar.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1692413460; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=fhTlw/KDqyUanTjcoZxqJpm0Dm5mWvJoRAWn5g99i9Q=; b=xcr5YoBYizxiTszKLTKk9QaE6QhPANv4zTXZqtuELgYNyNH/BGcYlWVEeo9i/zvvP6uEt7 n8gsZou6pFWbC8/HRJo3gUZVs5UUBPADo/GmgCdDPk2sA8t2VrjQu+rPeGnB4KbWscOAWs Bz5/p87YJeGZK+r3TPiQ0gAUBnqxCH4= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1692413460; a=rsa-sha256; cv=none; b=HpE+IKTR50eaITQPdyW+/LJmWQYlfc7gIov/uvWAYGozie9cJCxYk2U8gwlw0PEWkgOUet WkXJA4KyVd+D19WNGHAwIaL1Ybwa6JIuY9ILpzdmASZ6QYy+8hwItdfZgi7ik+H88F5pAt /V4MweCTqdhvwlkv6iOWENjCDU0unIg= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=cyphar.com header.s=MBO0001 header.b=KaQmHEg9; spf=pass (imf02.hostedemail.com: domain of cyphar@cyphar.com designates 80.241.56.152 as permitted sender) smtp.mailfrom=cyphar@cyphar.com; dmarc=pass (policy=reject) header.from=cyphar.com Received: from smtp102.mailbox.org (smtp102.mailbox.org [IPv6:2001:67c:2050:b231:465::102]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-384) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-p-102.mailbox.org (Postfix) with ESMTPS id 4RSNXZ4RVqz9skw; Sat, 19 Aug 2023 04:50:54 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cyphar.com; s=MBO0001; t=1692413454; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=fhTlw/KDqyUanTjcoZxqJpm0Dm5mWvJoRAWn5g99i9Q=; b=KaQmHEg9LbsvEytLoH9O5QzW7WwwCyvMfgmDFD87x6WhtAgb8utIhO1DdQgtwvMZP/BQ9v g4BpSHZsZkRwzWBUPe3DjxbA+iRmLhs7fJvj0l0TdPs7BKGyfZKZ4tnHb2/0fZfUtGgfBF KUlGYs4G7gfXBXD+xuZDx6hSHuLpPgRmSqCp6NP67T3mvwJjGKQRM2jade7vbRPXDf15nz 4iZn/u9OF0+OplD2GhQzYrr2Zpm/2qQxLFMTm6LjdZRB8/HrZsCx4acNGQ6JM92vpnHSO4 8CX9ii7twfB8BX6MVeIyCmK3bkM1PuGuK7fLHI20AzeoXo+7xvLoYWG3J1YVfA== Date: Sat, 19 Aug 2023 12:50:39 +1000 From: Aleksa Sarai To: Jeff Xu Cc: Andrew Morton , Shuah Khan , Kees Cook , Daniel Verkamp , Christian Brauner , Dominique Martinet , stable@vger.kernel.org, linux-api@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org Subject: Re: [PATCH v2 0/5] memfd: cleanups for vm.memfd_noexec Message-ID: <20230819.022033-joyful.ward.quirky.defender-lpHlCTglJUSs@cyphar.com> References: <20230814-memfd-vm-noexec-uapi-fixes-v2-0-7ff9e3e10ba6@cyphar.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="cl6uohug5kcd6xoo" Content-Disposition: inline In-Reply-To: X-Rspamd-Queue-Id: 64BC480014 X-Rspam-User: X-Stat-Signature: 1tnf9snuf1grsceszgsqdd9xnd6g4osw X-Rspamd-Server: rspam03 X-HE-Tag: 1692413460-959055 X-HE-Meta: U2FsdGVkX19It3B7wAWeAA31PAxLJ/Lc4twrxq0nlOcYnQ+x+Tmk9is49fIBoZhXlfCKOL3gGnob/5eeXKN+V6AYhBKMqIvN5u8225GIv/EUmZHe7aOlFHrHB1jNMN8w6z9scUPHL7bmxwIB+rsT9/8973JacKJiqm65ckC0DEQ29NToUi17AMvPZ+/oAxXr5MPTT2dB9qeH39ohPeact5Oa/9kylv3LpsomCQtitY97KZrzMVMVIDBVtzv9jLXtXNKb8M+rvbZwLA+J7Ky0VcrEAznlLEqF8/xYulXo4RR3MrcjDZ5REiz0sMSl3S4AvFAv5BQ4owFKO6iVDIOd39rFNRApxUOoVPPjqhiKvVVxZjqi07q4MzJS2qhQnl0P0Q3i1ZKjhEqd5osKYNXI+eu+r6OmUJAiViihMj3GwvMvGeRkYYJMuXti5zWo9RwfHZOqaZtM1Mnaj/qCwA+Ag7Z/291uVaC2hWtkg2s+Iw9urkQIZwoKPWQZppwPy4aMVxSyAk61uG43Atn0lWicqf1f4YvglKFkT1MvA5KjDzqmSQ5j6B5zQZw9cqqKoxN97RC3b0+HJovU/+QqK72WS6E/iP+mdU7Vck8OOios0pY8bHO8MPgpdhuX2bUM8KN2uKDGjNIIpErK5ELc5FyPc5cDe2FCsiWTTh9w0CBCZP47WY80P0dwnkApLyc4ex+Dyyd+/kQA8CoUrE1BF4qWuA/gzzNhHE7oRGZ2tbL+KVWb+hhCY5d4u9msybTMGPCUCI9GBQjgN7utSEhsDD925DVeQbSdVbLqg7CCwQq5ueMGU7Gx7vpPy0uoEJfOSML1cdZ09WfwNh93pnDG2ScDdf5xsMGH4b4zWeYZxydjtk4aO6pRv5xX7hNhfVRdgRDNZM8WZgNEgDMcp1a3YSM5UidpskY41Lg3hDhIj/inM/NQhMCs6+CXhvw1LExCpCnCLLINFpMieZVaMCbJhl2 orgwNXKw e3fywaVITVjug2slEOtxcc+tHs5BmNL1hRQyBYmEJlt7EM+B1U8yhTCnYhElXy4Wehq38eR6kOqKoBPmGzU3L8BUXC2b3jpyOyqM8Q8fNfSGzjSA/qLG5xc/oQCapZ7QOb5g2s3Vv5UKZX13cIdnmqCoNpuP6VPfmsTWJYgYgup5+8n6NtSmTDfOd1CeNdSq0rPX3q2PH/ZncGkwXHNA7lxiz80TN3ZJQ1a6U//zP8hInzfzItbvDu8tPqVKvuR01BdnohbtTVpdw8+KE5QerPmnSgPEce3HSt1nArTKdv+8noodH1n8pm/pedYMEtvU92f0WwZT1/vq6cvPRuM0WcqhLyqeY3sD08DTwU8LzuYOhxOAM6ZhaMhY+DATPnIh898hBAO6zRBeUtwyxlBTiZrAEGQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: --cl6uohug5kcd6xoo Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On 2023-08-15, Jeff Xu wrote: > On Mon, Aug 14, 2023 at 1:41=E2=80=AFAM Aleksa Sarai = wrote: > > > > The most critical issue with vm.memfd_noexec=3D2 (the fact that passing > > MFD_EXEC would bypass it entirely[1]) has been fixed in Andrew's > > tree[2], but there are still some outstanding issues that need to be > > addressed: > > > > * vm.memfd_noexec=3D2 shouldn't reject old-style memfd_create(2) sysca= lls > > because it will make it far to difficult to ever migrate. Instead it > > should imply MFD_EXEC. > > > > * The dmesg warnings are pr_warn_once(), which on most systems means > > that they will be used up by systemd or some other boot process and > > userspace developers will never see it. > > > > - For the !(flags & (MFD_EXEC | MFD_NOEXEC_SEAL)) case, outputting a > > rate-limited message to the kernel log is necessary to tell > > userspace that they should add the new flags. > > > > Arguably the most ideal way to deal with the spam concern[3,4] > > while still prompting userspace to switch to the new flags would be > > to only log the warning once per task or something similar. > > However, adding something to task_struct for tracking this would be > > needless bloat for a single pr_warn_ratelimited(). > > > > So just switch to pr_info_ratelimited() to avoid spamming the log > > with something that isn't a real warning. There's lots of > > info-level stuff in dmesg, it seems really unlikely that this > > should be an actual problem. Most programs are already switching to > > the new flags anyway. > > > > - For the vm.memfd_noexec=3D2 case, we need to log a warning for eve= ry > > failure because otherwise userspace will have no idea why their > > previously working program started returning -EACCES (previously > > -EINVAL) from memfd_create(2). pr_warn_once() is simply wrong here. > > > > * The racheting mechanism for vm.memfd_noexec makes it incredibly > > unappealing for most users to enable the sysctl because enabling it > > on &init_pid_ns means you need a system reboot to unset it. Given the > > actual security threat being protected against, CAP_SYS_ADMIN users > > being restricted in this way makes little sense. > > > > The argument for this ratcheting by the original author was that it > > allows you to have a hierarchical setting that cannot be unset by > > child pidnses, but this is not accurate -- changing the parent > > pidns's vm.memfd_noexec setting to be more restrictive didn't affect > > children. > > > That is not exactly what I said though. Sorry, I probably should've phrased this as "one of the main arguments". In the last discussion thread we had in the v1 of this patch, it was my impression that this was the primary sticking point. > From ChromeOS's position, allowing downgrade is less secure, and this > setting was designed to be set at startup/reboot time from the very > beginning, such that the kernel command line or as part of the > container runtime environment (get passed to sandboxed container) If this had been implemented as a cmdline flag, it would be completely reasonable that you need to reboot to change it. However, it was implemented as a sysctl and the behaviour of sysctls is that admins can (generally) change them after they've been set -- even for security-related sysctls such as the fs.protected_* sysctls. The only counter-example I know if the YAMA one, and if I'm being honest I think that behaviour is also weird. > I understand your viewpoint, from another distribution point of view, > the original design might be too restricted, so if the kernel wants > to weigh more on ease of admin, I'm OK with your approach. > Though it is less secure for ChromeOS - i.e. we do try to prevent > arbitrary code execution as much as possible, even for CAP_SYSADMIN. > And with this change, it is less secure and one more possibility for > us to consider. FWIW I still think the threat model where a &init_user_ns-privileged CAP_SYS_ADMIN process can be tricked into writing a sysctl should be protected against by memfd_create(MFD_EXEC) doesn't really make sense for the vast majority of systems (if any). If ChromeOS really wants the old vm.memfd_noexec=3D2 behaviour to be enforced, this can be done with a very simple seccomp filter. If applied to pid1, this would also not be possible to unset without a reboot. --=20 Aleksa Sarai Senior Software Engineer (Containers) SUSE Linux GmbH --cl6uohug5kcd6xoo Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQS2TklVsp+j1GPyqQYol/rSt+lEbwUCZOAt/gAKCRAol/rSt+lE b6Y+AP0Wm/MW4iFFZsjNk1Ve8p+43R1ThqUEhy9cNUa2A+qukwD/QNnCpAY1pDe8 3IaSvk3K4nMzrVgV/bxIOy2uMvt6QQY= =W70o -----END PGP SIGNATURE----- --cl6uohug5kcd6xoo--