From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9CE3DC04A94 for ; Wed, 2 Aug 2023 21:49:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 257312801FB; Wed, 2 Aug 2023 17:49:07 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 207592801EB; Wed, 2 Aug 2023 17:49:07 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0F94A2801FB; Wed, 2 Aug 2023 17:49:07 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id F2C5C2801EB for ; Wed, 2 Aug 2023 17:49:06 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id C18011C945E for ; Wed, 2 Aug 2023 21:49:06 +0000 (UTC) X-FDA: 81080505492.09.C68D19E Received: from mout-p-103.mailbox.org (mout-p-103.mailbox.org [80.241.56.161]) by imf16.hostedemail.com (Postfix) with ESMTP id B06C4180005 for ; Wed, 2 Aug 2023 21:49:04 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=cyphar.com header.s=MBO0001 header.b=L3qQ4uJs; spf=pass (imf16.hostedemail.com: domain of cyphar@cyphar.com designates 80.241.56.161 as permitted sender) smtp.mailfrom=cyphar@cyphar.com; dmarc=pass (policy=reject) header.from=cyphar.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1691012945; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=i04Z8bgkzw2aMO36HReNQAuyWQgn5tBl6lRpl7bgpMs=; b=yWOql4tXsLi+O6S+HLBUnA9Z1SAk3UZsMnDWbZu2GEa+tAb3kU3VTU/7Nca0GoH4yfIPaY /2WKvLOMGmKaQtx6RHLOyXkRZF7FebcJ6jq44/nJOhQ7qjvYsP9+0+igOszFo86TMDqphe aCn6uFGC7gVMVy94rm1I/J2nhFFiJao= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=cyphar.com header.s=MBO0001 header.b=L3qQ4uJs; spf=pass (imf16.hostedemail.com: domain of cyphar@cyphar.com designates 80.241.56.161 as permitted sender) smtp.mailfrom=cyphar@cyphar.com; dmarc=pass (policy=reject) header.from=cyphar.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1691012945; a=rsa-sha256; cv=none; b=FQCsQf/Zx7fy9wF/exwB3MXRzzJrMLY6fr6AVO1frP8XS9jAIrs/OBcwIHMLXLA/4rcpUU XJef5Ra1sV+aMpDOGWRUh+7RXU9/ipSdh/OJhf8UaqFwFQDJSBcVaJo9E3MriCem4rt0ax 7TOlCZXhNrpRLcEscOIF5/BN/r6MpDk= Received: from smtp202.mailbox.org (smtp202.mailbox.org [IPv6:2001:67c:2050:b231:465::202]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-384) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-p-103.mailbox.org (Postfix) with ESMTPS id 4RGQbc1BCmz9sTD; Wed, 2 Aug 2023 23:49:00 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cyphar.com; s=MBO0001; t=1691012940; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=i04Z8bgkzw2aMO36HReNQAuyWQgn5tBl6lRpl7bgpMs=; b=L3qQ4uJsloDWhmez7dk0vWebPtjr0CPjKjzFCrqJT8HZYhxvWTE4Phy919tGtYat8G3va7 9JFNqoJltD21NQtO6XUICSW96NSisGfmx66IFzdt1hInJj2/ZGEK5DwPzNDGoHpNpgbN71 BsaAtzQ4q1POirg4djVe2zypnQD8KEJKJDU3Mp/LkWD0w+W6Vm9aQKBuyzfx6PNM+cnD5C AnOiznEsYiqeWXTkk5mG68xUkHzP2KvdyU3PoTf7/5F+tnm4KnpCGLjZ16B5MEWrUeFOyv Wthc40romOk0aGW6wYZ1DGvj8UGfP1kive5M18gFNPollkH9FG2AXZhbJjVA8w== Date: Thu, 3 Aug 2023 07:48:48 +1000 From: Aleksa Sarai To: Jeff Xu Cc: Jeff Xu , Andrew Morton , Shuah Khan , Kees Cook , Daniel Verkamp , Luis Chamberlain , YueHaibing , linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-hardening@vger.kernel.org Subject: Re: [RFC PATCH 0/3] memfd: cleanups for vm.memfd_noexec Message-ID: <20230802.213938-adept.raisins.dental.revival-IIlYKYPmegSa@cyphar.com> References: <20230713143406.14342-1-cyphar@cyphar.com> <20230801.032503-medium.noises.extinct.omen-CStYZUqcNLCS@cyphar.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="w2o6q6aerims2enk" Content-Disposition: inline In-Reply-To: X-Rspamd-Queue-Id: B06C4180005 X-Rspam-User: X-Stat-Signature: higg33sz8u4itwydjkogfr53ebp9hj7a X-Rspamd-Server: rspam01 X-HE-Tag: 1691012944-655113 X-HE-Meta: U2FsdGVkX18iupD5YK96aTewVOB83dd8uJRKDot8C3ax/RURdMXH6hLOxjNu8z5C1cfO5VOcTKGWlL6NIdkc2kddMzZcEmAv9yQT10SGhQsFBgpZNiWtfFQVuenEbu4E6UTuIRDxmPrdLxXVmfY2cUN3lJvVkuLOxF+9f7PdttMvjNjrGQRKP/Qw96028bH39dOh03e0YVGVMNzAXDq1uVhtXja4OvSVF2s8LnGSzixq2n12IOv680uci2EL+bIDH1N4k3SS1rhX9HXj28rwq//kdwpSgMiN0cZNvxqzgdRKu95XfdzM2b48+7yG4qRgMbmchvL6BANxhW9nPRDolLR4sOH10MKLcSSd4sPH0qabINVO3z2/5DHpSpzB7ETDLDisj79SQRX8uuTBGdirxqH65fPVKrCMGC8BpmGGxB6CJAzqtwFMNfsVDYFco7BdHG9RQLNx3DgNguZIwwNNVwOshQdtJL7TMSOy7m9Tsl0LNzHpDRwH1RuZXpIyFLfRvq74MPaVf8gtvLgG6p8QI4ZRn71HIvEtcUIQa4ehVj2sORu+22E3zOOK5kCRV5iBeJLPdtqoSvvLXIzjKc4bXgOhDp36K85i0VozpGdJI7VuCn8LcgjNeMCo6L7R+P+aZSc/uWqCL71otL4q6WFo3dUX7S9P/Tn2NzJ3WlHeU+TSDCiWnRi26+sWBkqb22nHEIO9MahJ/stkdTiHMXJP0NytonvzuoB1QRj9zIsOcoAl+CcvVSiu2qVWynKR//Khl62i0r16WgsSGQOiag7ENtH4WCR7NPTrype2Iu0gnf2482e6s89JpqPgX7pDSDtnh2eKGbcDx8B+ncwhudUxJSeyOn9u8qjhKWpEAh62t21e2KcLZGoK3bMeZSzZXEPznRztHIZ6NUopJ6uvWfLEdYFGZXJXGIhqzYEmBjTq0pq0bFOswBVyq8k6jQkCpxzhInmfxuIlJuIJSZF1gYs eeR5b9XE Rr0N0QZZxLCEilEwyLkdu18awsNSJCqMnYjVC3uSEQ+zX7Db9iLreXvq+YjYW27wnaKI0PW7Cqx0kfiMN/ddA2W9Sj0s06OUSegeFUhZmlxvNjSJs1DOZFMPUqnPiFnK7GY52xGPtNPa/9SYDI7pf5O6AZOFq6FXeDcz964WRKf8neaWOvyAL5SdpmWwgGszF1BJMMnwEf0pLCKXCl5uHtXfFutrBawZblJXiCguXDP2qhrJwz0l0EM5lR0bHk1apwUEbaSRL+4BfygA8Jex7Tq4+DEJXkSRDoxrtCs5GIsOdSZlisi3HNkoj/IAc7bEB6nT8u27L6NBL0A0/vo9SSM90BDJZc8EYr07+sbCVpAwPPc/nuwFB1Wx0LUAcqCgfYZRL5tokhpndnurJBQaodhCL/iZANgEvkL+TE4CJYsDt3u+egIZFqBTFDfxs5OyoN9ZajQgFiyGyYSyycc3oUVAFcrl/762saRsMfWt8oOKgw+6zRva+p2M3sLSB0v8/6Pf6HNNa1gZku94= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: --w2o6q6aerims2enk Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On 2023-08-02, Jeff Xu wrote: > > > > > > * vm.memfd_noexec=3D2 shouldn't reject old-style memfd_create(= 2) syscalls > > > > > > because it will make it far to difficult to ever migrate. In= stead it > > > > > > should imply MFD_EXEC. > > > > > > > > > > > Though the purpose of memfd_noexec=3D2 is not to help with migrat= ion - > > > > > but to disable creation of executable memfd for the current syste= m/pid > > > > > namespace. > > > > > During the migration, vm.memfd_noexe =3D 1 helps overwriting for > > > > > unmigrated user code as a temporary measure. > > > > > > > > My point is that the current behaviour for =3D2 means that nobody o= ther > > > > than *maybe* ChromeOS will ever be able to use it because it requir= es > > > > auditing every program on the system. In fact, it's possible even > > > > ChromeOS will run into issues given that one of the arguments made = for > > > > the nosymfollow mount option was that auditing all of ChromeOS to > > > > replace every open with RESOLVE_NO_SYMLINKS would be too much effor= t[1] > > > > (which I agreed with). Maybe this is less of an issue with > > > > memfd_create(2) (which is much newer than open(2)) but it still see= ms > > > > like a lot of busy work when the =3D1 behaviour is entirely sane ev= en in > > > > the strict threat model that =3D2 is trying to protect against. > > > > > > > It can also be a container (that have all memfd_create migrated to ne= w API) > > > > If ChromeOS would struggle to rewrite all of the libraries they use, > > containers are in even worse shape -- most container users don't have a > > complete list of every package installed in a container, let alone the > > ability to audit whether they pass a (no-op) flag to memfd_create(2) in > > every codepath. > > > > > One option I considered previously was "=3D2" would do overwrite+bloc= k , > > > and "=3D3" just block. But then I worry that applications won't have > > > motivation to ever change their existing code, the setting will > > > forever stay at "=3D2", making "=3D3" even more impossible to ever be= used > > > system side. > > > > What is the downside of overwriting? Backwards-compatibility is a very > > important part of Linux -- being able to use old programs without having > > to modify them is incredibly important. Yes, this behaviour is opt-in -- > > but I don't see the point of making opting in more difficult than > > necessary. Surely overwite+block provides the security guarantee you > > need from the threat model -- othewise nobody will be able to use block > > because you never know if one library will call memfd_create() > > "incorrectly" without the new flags. > > > > > > > > If you want to block syscalls that don't explicitly pass NOEXEC_SEA= L, > > > > there are several tools for doing this (both seccomp and LSM hooks). > > > > > > > > [1]: https://lore.kernel.org/linux-fsdevel/20200131212021.GA108613@= google.com/ > > > > > > > > > Additional functionality/features should be implemented through > > > > > security hook and LSM, not sysctl, I think. > > > > > > > > This issue with =3D2 cannot be fixed in an LSM. (On the other hand,= you > > > > could implement either =3D2 behaviour with an LSM using =3D1, and t= he > > > > current strict =3D2 behaviour could be implemented purely with secc= omp.) > > > > > > > By migration, I mean a system that is not fully migrated, such a > > > system should just use "=3D0" or "=3D1". Additional features can be > > > implemented in SELinux/Landlock/other LSM by a motivated dev. e.g. if > > > a system wants to limit executable memfd to specific programs or fully > > > disable it. > > > "=3D2" is for a system/container that is fully migrated, in that case, > > > SELinux/Landlock/LSM can do the same, but sysctl provides a convenient > > > alternative. > > > Yes, seccomp provides a similar mechanism. Indeed, combining "=3D1" a= nd > > > seccomp (block MFD_EXEC), it will overwrite + block X mfd, which is > > > essentially what you want, iiuc.However, I do not wish to have this > > > implemented in kernel, due to the thinking that I want kernel to get > > > out of business of "overwriting" eventually. > > > > See my above comments -- "overwriting" is perfectly acceptable to me. > > There's also no way to "get out of the business of overwriting" -- Linux > > has strict backwards compatibility requirements. > > >=20 > I agree, if we weigh on the short term goal of letting the user space > applications to do minimum, then having 4 state sysctl (or 2 sysctl, > one controls overwrite, one disable/enable executable memfd) will do. > But with that approach, I'm afraid a version of the future (say in 20 > years), most applications stays with memfd_create with the old API > style, not setting the NX bit. With the current approach, it might seem > to be less convenient, but I hope it offers a bit of incentive to make > applications migrating their code towards the new API, explicitly > setting the NX bit. I understand this hope is questionable, we might > still end up the same in 20 years, but at least I tried :-). I will > leave this decision to maintainers when you supply patches for that, > and I wouldn't feel bad either way, there is a valid reason on both sides. People will not switch =3D2 on if it has the possibility of breaking existing programs that are doing nothing wrong by not passing a noop flag. In 20 years at best you would have =3D1 in widespread use because the rewriting behaviour is what users expect of kernel uAPIs. They expect old programs to work without modifying them if they aren't doing anything wrong. A uAPI knob that requires every userspace program to change before you can safely enable it (especially because it ratchets in a way that makes it dangerous to enable on production machines) will simply not be used. If the goal is to get programs to update (which it seems it is), having a knob that nobody will turn on doesn't help. Doing proper warning logging is the way to get userspace to switch -- userspace usually notices when their programs trigger warnings in dmesg. > To supplement, there are two other ways for what you want: > 1> seccomp to block MFD_EXEC, and leaving the setting to 1. I made this point in an earlier mail. However my point is that =3D2 is not an acceptable uAPI and if you want something that looks like =3D2 you can also implement that with seccomp too! In fact, the key difference is that you cannot implement the rewriting easily in seccomp -- you would need to install a seccomp_notify monitor that does nothing but rewrite syscall arguments. This would be equivalent to running the entire system under GDB to work around a uAPI flaw. > 2> implement the blocking using a security hook and LSM, imo, which is > probably the most common way to deal with this type of request (block > something). The issue is not the blocking, it's the rewriting. --=20 Aleksa Sarai Senior Software Engineer (Containers) SUSE Linux GmbH --w2o6q6aerims2enk Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQS2TklVsp+j1GPyqQYol/rSt+lEbwUCZMrPQAAKCRAol/rSt+lE b4ZmAP4j1QzOrrlorZ9dFzAjKiWri6UkXw3KrM473BItd9lriAEA0JKOcWehL+c/ 34o/eX/AOXwrJ9p80VBoPFsIqPj6TwY= =I+84 -----END PGP SIGNATURE----- --w2o6q6aerims2enk--