From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C20BCC001DF for ; Fri, 14 Jul 2023 00:07:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3E9C48E0003; Thu, 13 Jul 2023 20:07:45 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 399A48E0001; Thu, 13 Jul 2023 20:07:45 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2605E8E0003; Thu, 13 Jul 2023 20:07:45 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 13B7F8E0001 for ; Thu, 13 Jul 2023 20:07:45 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id BEF8C40415 for ; Fri, 14 Jul 2023 00:07:44 +0000 (UTC) X-FDA: 81008278848.21.7E52297 Received: from mout-p-101.mailbox.org (mout-p-101.mailbox.org [80.241.56.151]) by imf23.hostedemail.com (Postfix) with ESMTP id B6366140007 for ; Fri, 14 Jul 2023 00:07:42 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=cyphar.com header.s=MBO0001 header.b=REmtctvx; dmarc=pass (policy=reject) header.from=cyphar.com; spf=pass (imf23.hostedemail.com: domain of cyphar@cyphar.com designates 80.241.56.151 as permitted sender) smtp.mailfrom=cyphar@cyphar.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1689293263; a=rsa-sha256; cv=none; b=Kpvjku8tvnXK2ehNxn5DjPuC/SK94ktU+oTXC16Y4DAcD1i5AbeZneoo/b4nfJ7rV3Dy5t orYlHGfAjgI3ZzTRDae+ejzLpnSl81eSZNXCdSyI24ZVcYb6jO8lT155mROpBKiuL0Xtgm WdWTPiC3PmRTWuhL1wkMSXzqCFq8zS0= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=cyphar.com header.s=MBO0001 header.b=REmtctvx; dmarc=pass (policy=reject) header.from=cyphar.com; spf=pass (imf23.hostedemail.com: domain of cyphar@cyphar.com designates 80.241.56.151 as permitted sender) smtp.mailfrom=cyphar@cyphar.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1689293263; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=OxAPE5PWCAcXqFIorj18HGTh/eNPcEPw2BzrDqE7QOk=; b=cB3yp8RGNSAO0PDcq3MbTDK+EFA2M7H7IBudEh7F2dkUiefbA/tyB7JoHhLBO5Z6WYrTDl R1y7ardrcNK1oANTVPwa0so0TYKsZRIWJpZQNxVMt850JQcEU8LgFboVgsx5NZJgjswlq8 P12ZYGA0CGv6ot7BD0k0l1uSqQn9mfc= Received: from smtp2.mailbox.org (smtp2.mailbox.org [IPv6:2001:67c:2050:b231:465::2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-384) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-p-101.mailbox.org (Postfix) with ESMTPS id 4R2Bcn3Mnzz9smQ; Fri, 14 Jul 2023 02:07:37 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cyphar.com; s=MBO0001; t=1689293257; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=OxAPE5PWCAcXqFIorj18HGTh/eNPcEPw2BzrDqE7QOk=; b=REmtctvxEJLDq1JtNm0+KBL/TGwaI3oEFMWaulKzejxxrE1ptEB21Q/D95oMpkwMNuy4NL OXlPLuS6jAg8CBCm4smppRN9U1EHg8LI4Z4xHMxsxKMQbXcnN5cVMIShrrlh6kVQWoZCjM 0oWawiir/mzs+Pl0F5KJCu+op+68v81tUmOA07jATIuFaZB2+TeYs0G8R4Ipvtjg5k65lf TsH/dra5lOANgOgPP/6afTuVl7HhJGJPLRczs1A90/AgtKeJjZDKdxC6mG8T8UJfY3dMqy fNhteInF8njLziq7XklRiIsj5M+z5Se72b0vtysOMmpSer0CcGsf1XQDxNPMPQ== Date: Fri, 14 Jul 2023 10:07:21 +1000 From: Aleksa Sarai To: Andrew Morton , Jeff Xu , YueHaibing , Luis Chamberlain , Kees Cook , Daniel Verkamp Cc: linux-mm@kvack.org, Dominique Martinet , Christian Brauner , stable@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [RFC PATCH 2/3] memfd: remove racheting feature from vm.memfd_noexec Message-ID: References: <20230713143406.14342-1-cyphar@cyphar.com> <20230713143406.14342-3-cyphar@cyphar.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="4ubij2xktskdezgp" Content-Disposition: inline In-Reply-To: <20230713143406.14342-3-cyphar@cyphar.com> X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: B6366140007 X-Stat-Signature: n9cyyywyekyfoezrey5xxorsd8g4fhiz X-HE-Tag: 1689293262-930487 X-HE-Meta: U2FsdGVkX1+35q2IeeukncurQ7HkXG2eHtdCnOBwV+kTHsiEsPPtXb3Ntdn/d3o4Y6wOfq+mHzUEuBf74vUKM4/SzWRHBwzSTddlskyVfn5SvzjHCVbhRgcpACn6yYc9mGHcyYzjc+X4S2ZNUax8RQWngd/kwUYt7LlbzpspRH1QycWNbFxopbes6zTv7Le4aMJ2nDDwmQj/nusoBBLzmn5jjpC/gfyz7trMcQcseRUeT0xXXNsND7LRqPrPAt8/jHzidr+xOjQM4HsTW8NoWBG4t2+xzD1Qsmd6UQvlvGxA/w5eV4xVyBr/uul5sW+4rbBNlTRz9CK9q/0O1TvgVA6wZJ8IvqbvQNWW3s47GZuN+ZRETPfMj4e43RqRkK250g6kgchm6tucDmHRQzaEtP5z9Z8eAMRbSl0PLao9O/sAqQ4bvk1pqQV1kgUKpz2jTeu5ch+F2TipqMaj8GBWZ7D7RU21Zej2x/zeQ8ItQjl/3TI5oNPF6gkPDYgq9d2ktyOxyyM+5+CVx24FWSowMl9hrGLCKaTAHhnO+0cQMP4dwlo6WqfccvyBVoSCN9O9ao0MeUrVvQQMwjc/8bOlr4ur+MXYxZPXgeeN9tDb5iTXCdxKxf80TdqUetMwFXZmvKNC43BASCt/cK2NhzoA4iM3lqEV6D0BS0c1osIfG2yoGTev4xTAz5qU3i7Ec5DxX/B8noioCM4bHcDvQ7GQC9DcCHgzso36n4+zD9spUfc4bDcjN5+wYMxqoc12iWh8Avvf2f5lkR/5dUCYZRqZg8m5GrnZLOiP3PyN0zySHAdh+DLNBSBlWUik29r9U9O4wl+EI+o1ovYoUri+qg1PRI1iDRrwym8maziFWPYw4zkCCI1nI0JDw4WGV6upbKAfWVQLcaxThaGTSjN9f9sga+FhvJkDKJWoSvfC/UJzVhlpPte8HRZ/2cEn/jfc1wzLaCyw6qSibJCs3q7Pbju HH+ZP6hB PhQUNWwEJ0Z+df1/5fBA9BYTh6RJKaOrLlx/MXeSi7BBLWC3VjPinAGVV7Q2yBSjPXojp1mzfQQ+LAomnfjOB5iZqYu12s5nCi85xLcpQdBMqA1XvZkudK9goJgBpD7rR+7vdOwmUOmmJzh5mqZDU04xkj1UB6N3DQ6/XDK7Z7L2uhCkeU7J4QxXO0ep2cLV/u4dTZw9LQcsd9HjARZHm/fR8fd1TYu6ah1GOiJzWQKkO3jdKqW/OpM9nBXe8boapmzB1sQqN/O0L8mAZh1gXdPlzet1kdJ7sqJyxuYyG2cb5IUbj6MIiZgS64W+LDXdbpEVFHgw3YD9UmULQOkqbYseL5gsEMYiCn8d5bdKfdaM1Eh58754lsmO/8Pg32Lm9JPz8GYHXwl5AYWU74uAFedcG2JIteEy77vV76RYX5W+bifirF2tW/J6AqrV5j+2es67VXmVoSq23u+A/E270CJf33RNk1Zg0oLkHq02jdBC4ziAx6lk1LmtgPOlICFEYL/igLAzEiKu+/O8= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: --4ubij2xktskdezgp Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On 2023-07-14, Aleksa Sarai wrote: > This sysctl has the very unusal behaviour of not allowing any user (even > CAP_SYS_ADMIN) to reduce the restriction setting, meaning that if you > were to set this sysctl to a more restrictive option in the host pidns > you would need to reboot your machine in order to reset it. >=20 > The justification given in [1] is that this is a security feature and > thus it should not be possible to disable. Aside from the fact that we > have plenty of security-related sysctls that can be disabled after being > enabled (fs.protected_symlinks for instance), the protection provided by > the sysctl is to stop users from being able to create a binary and then > execute it. A user with CAP_SYS_ADMIN can trivially do this without > memfd_create(2): >=20 > % cat mount-memfd.c > #include > #include > #include > #include > #include > #include >=20 > #define SHELLCODE "#!/bin/echo this file was executed from this totally= private tmpfs:" >=20 > int main(void) > { > int fsfd =3D fsopen("tmpfs", FSOPEN_CLOEXEC); > assert(fsfd >=3D 0); > assert(!fsconfig(fsfd, FSCONFIG_CMD_CREATE, NULL, NULL, 2)); >=20 > int dfd =3D fsmount(fsfd, FSMOUNT_CLOEXEC, 0); > assert(dfd >=3D 0); >=20 > int execfd =3D openat(dfd, "exe", O_CREAT | O_RDWR | O_CLOEXEC, 0782); 0777 Oops. I must've garbled something when copying from my test program. > assert(execfd >=3D 0); > assert(write(execfd, SHELLCODE, strlen(SHELLCODE)) =3D=3D strlen(SHELL= CODE)); > assert(!close(execfd)); >=20 > char *execpath =3D NULL; > char *argv[] =3D { "bad-exe", NULL }, *envp[] =3D { NULL }; > execfd =3D openat(dfd, "exe", O_PATH | O_CLOEXEC); > assert(execfd >=3D 0); > assert(asprintf(&execpath, "/proc/self/fd/%d", execfd) > 0); > assert(!execve(execpath, argv, envp)); > } > % ./mount-memfd > this file was executed from this totally private tmpfs: /proc/self/fd/5 > % >=20 > Given that it is possible for CAP_SYS_ADMIN users to create executable > binaries without memfd_create(2) and without touching the host > filesystem (not to mention the many other things a CAP_SYS_ADMIN process > would be able to do that would be equivalent or worse), it seems strange > to cause a fair amount of headache to admins when there doesn't appear > to be an actual security benefit to blocking this. >=20 > It should be noted that with this change, programs that can do an > unprivileged unshare(CLONE_NEWUSER) would be able to create an > executable memfd even if their current pidns didn't allow it. However, > the same sample program above can also be used in this scenario, meaning > that even with this consideration, blocking CAP_SYS_ADMIN makes little > sense: >=20 > % unshare -rm ./mount-memfd > this file was executed from this totally private tmpfs: /proc/self/fd/5 >=20 > This simply further reinforces that locked-down environments need to > disallow CLONE_NEWUSER for unprivileged users (as is already the case in > most container environments). >=20 > [1]: https://lore.kernel.org/all/CABi2SkWnAgHK1i6iqSqPMYuNEhtHBkO8jUuCvmG= 3RmUB5TKHJw@mail.gmail.com/ >=20 > Cc: Dominique Martinet > Cc: Christian Brauner > Cc: stable@vger.kernel.org # v6.3+ > Fixes: 105ff5339f49 ("mm/memfd: add MFD_NOEXEC_SEAL and MFD_EXEC") > Signed-off-by: Aleksa Sarai > --- > kernel/pid_sysctl.h | 7 ------- > 1 file changed, 7 deletions(-) >=20 > diff --git a/kernel/pid_sysctl.h b/kernel/pid_sysctl.h > index b26e027fc9cd..8a22bc29ebb4 100644 > --- a/kernel/pid_sysctl.h > +++ b/kernel/pid_sysctl.h > @@ -24,13 +24,6 @@ static int pid_mfd_noexec_dointvec_minmax(struct ctl_t= able *table, > if (ns !=3D &init_pid_ns) > table_copy.data =3D &ns->memfd_noexec_scope; > =20 > - /* > - * set minimum to current value, the effect is only bigger > - * value is accepted. > - */ > - if (*(int *)table_copy.data > *(int *)table_copy.extra1) > - table_copy.extra1 =3D table_copy.data; > - > return proc_dointvec_minmax(&table_copy, write, buf, lenp, ppos); > } I also have a patch to properly tie the sysctl to the pid namespace rather that having a global sysctl that magically has its value changed in this pid_mfd_noexec_dointvec_minmax() and another to do the same for the other pidns-tied sysctl (kernel.ns_last_pid) but I'm not sure whether it's needed. It does make vm.memfd_noexec a bit cleaner but because the two sysctls are in different tables you can't register them together AFAICS which means a bunch of needless duplication. > =20 > --=20 > 2.41.0 >=20 --=20 Aleksa Sarai Senior Software Engineer (Containers) SUSE Linux GmbH --4ubij2xktskdezgp Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQS2TklVsp+j1GPyqQYol/rSt+lEbwUCZLCRuQAKCRAol/rSt+lE bycBAQCsJjBVjXfGnfUjczZi2Uw40z/FGWKZ7Na80SxTTtsRGgEArv1CYjnaQLTR YVTNeXzuho+Bp3RaPonAbsjEAACsEwM= =J8sH -----END PGP SIGNATURE----- --4ubij2xktskdezgp--