From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 99B14C001DE for ; Fri, 28 Jul 2023 07:55:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CFF4B6B0072; Fri, 28 Jul 2023 03:55:24 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CAFF96B0074; Fri, 28 Jul 2023 03:55:24 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B51856B0075; Fri, 28 Jul 2023 03:55:24 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 9EA386B0072 for ; Fri, 28 Jul 2023 03:55:24 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 597D540911 for ; Fri, 28 Jul 2023 07:55:24 +0000 (UTC) X-FDA: 81060260568.28.FB05D84 Received: from out3-smtp.messagingengine.com (out3-smtp.messagingengine.com [66.111.4.27]) by imf29.hostedemail.com (Postfix) with ESMTP id EE2F912000C for ; Fri, 28 Jul 2023 07:55:21 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=readahead.eu header.s=fm3 header.b=ZwgwJ+Bx; dkim=pass header.d=messagingengine.com header.s=fm3 header.b=ohOYPS3k; spf=pass (imf29.hostedemail.com: domain of david@readahead.eu designates 66.111.4.27 as permitted sender) smtp.mailfrom=david@readahead.eu; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1690530922; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=XVoKzIZsKS3ZCP8rUUytIx3YBhBabFVdsfQLR4c+1Us=; b=M7eGKEjhAe0fK8Nz3kDN/O6RtjoYn4ESpKhpLEuECYpGRpA/j3cWMRs8xD4sPUrpgK3Gce h2FG+cbRm0tpDN9OFo+D5YoKapg2Db5Yc4wMAPo36xM/xfUddkg0iiHLdayV2Q6pDlY00j hqFyOC+MmvFcbSS8F34beDUfPV97I6I= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=readahead.eu header.s=fm3 header.b=ZwgwJ+Bx; dkim=pass header.d=messagingengine.com header.s=fm3 header.b=ohOYPS3k; spf=pass (imf29.hostedemail.com: domain of david@readahead.eu designates 66.111.4.27 as permitted sender) smtp.mailfrom=david@readahead.eu; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1690530922; a=rsa-sha256; cv=none; b=XjkdASVPftnlv04pxvRxke27P5ZgRikVgOXc3xsMuSlE/b5exDhK53n1HLOu9WQVSPF+ys WTgz6jYW3xjUHEe8u/J3/OCCBy0WYvvP32wQRb0+yGABPILmRCMV1UATy8GadOVG3W24tl QwgIpVGi9E12Buo+26j4CUAeDF6SOl0= Received: from compute6.internal (compute6.nyi.internal [10.202.2.47]) by mailout.nyi.internal (Postfix) with ESMTP id 810B05C00AF; Fri, 28 Jul 2023 03:55:19 -0400 (EDT) Received: from imap50 ([10.202.2.100]) by compute6.internal (MEProxy); Fri, 28 Jul 2023 03:55:19 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=readahead.eu; h= cc:cc:content-transfer-encoding:content-type:content-type:date :date:from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:sender:subject:subject:to:to; s=fm3; t= 1690530919; x=1690617319; bh=XVoKzIZsKS3ZCP8rUUytIx3YBhBabFVdsfQ LR4c+1Us=; b=ZwgwJ+BxR5QMmxbkfmsXE5rzJHGLOUN73LJwpukRXw4W7jWzVdk MCAZcTiDk5OnV3OaiQAx5lHhDUPdl6yHDHlz4O1vmy7prHo3KnppZ3x4aTTE5SIy jA+UFZKZwfukWEN5iAMKMDgmaYK6j8so/delo6E5SbAwMfenNwVNLsl0IrR615Eh Mw2uP4OGpHlO+bdcBJ9VlP6D6PgElYTKccAUQe87r8EzeRR77PPqlQhq+6mSULBn pEGNWE9svcYY+ykdn7Vr/63MmmD5gp67NsHE7HJPbT14AIGF2rp+mdkPbvsw/OVm zMHkcAw1FrUi2QA/3K7i/AKDAT5W5k7AfDg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:sender:subject:subject:to:to:x-me-proxy :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm3; t= 1690530919; x=1690617319; bh=XVoKzIZsKS3ZCP8rUUytIx3YBhBabFVdsfQ LR4c+1Us=; b=ohOYPS3kWkf/3ikd4qcOvqZgIwrcayBI4xEtM66/VY1OuJuIrm1 95rjTz+hySiLj1QotTNFzuUwvnH5QoLh2h9LViYJOWN+ATMpy/IU5aGxTtceBwDm nfR1z179oBC+hKJjEtvF9KezlU5TJMCIRlCqT1YOlO8A/aE0M6obznfF5FE1CGaB ieud7VpLjGe+ZksRsaaoCPYR8u6uZ6Ep9DKqZfBdktyXkbZTBnE2G1ZPFs7AmREI cEuPV2BVFkH0ddCnpBjWLejP7IjqrQwO9q9X1jS6ZyL0VcF4EOQMNP39FLjOSooa 0T8WCd20iq1IOhEfWUc2kLEZoztTg49CaJQ== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedviedrieehgdduvddvucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhepofgfggfkjghffffhvfevufgtgfesthhqredtreerjeenucfhrhhomhepfdff rghvihguucfthhgvihhnshgsvghrghdfuceouggrvhhiugesrhgvrggurghhvggrugdrvg huqeenucggtffrrghtthgvrhhnpeejgeeutdeufedtjeffvdfghfdvvdetteejfedtieff keduffeiheeijeehvdekteenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmh grihhlfhhrohhmpegurghvihgusehrvggruggrhhgvrggurdgvuh X-ME-Proxy: Feedback-ID: id2994666:Fastmail Received: by mailuser.nyi.internal (Postfix, from userid 501) id 28A0C1700089; Fri, 28 Jul 2023 03:55:19 -0400 (EDT) X-Mailer: MessagingEngine.com Webmail Interface User-Agent: Cyrus-JMAP/3.9.0-alpha0-592-ga9d4a09b4b-fm-defalarms-20230725.001-ga9d4a09b Mime-Version: 1.0 Message-Id: In-Reply-To: References: <20230714114753.170814-1-david@readahead.eu> Date: Fri, 28 Jul 2023 09:54:57 +0200 From: "David Rheinsberg" To: "Jeff Xu" Cc: linux-kernel@vger.kernel.org, "Andrew Morton" , "Kees Cook" , "Daniel Verkamp" , linux-mm@kvack.org, "Peter Xu" , linux-hardening@vger.kernel.org Subject: Re: [PATCH] memfd: support MFD_NOEXEC alongside MFD_EXEC Content-Type: text/plain;charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: EE2F912000C X-Rspam-User: X-Stat-Signature: hurg41q961fyqctug1qskg4rnnot7nn6 X-Rspamd-Server: rspam01 X-HE-Tag: 1690530921-572005 X-HE-Meta: U2FsdGVkX1+aFk79bMJ1sP7HqTyIlakP0iAQAiQT1Jhbdso+tZt8uO1/SBoJ9DNEBF+6O+qkG80rOaststWm3Af8NtQYuqm6Ue2gENwOJj/t52ckpAsRvuQMA8eAZ9HmEKPdAaOU3WAF6Oea7KYZ5TR8cbeAuEpyCqnTX7NCbfOk+ZaFJQwVPBdeP5wlDW1x1s4UrAlnxtRRRr2DqJwtCGIYR0WZtMdObUtgO4TRvBO2Zsh4kVXkPeDOFyaeE//sr1mJAkGBvOyIlwxmzAv1Ezt2wygCC8sewnA+B0EWXq9ozOshqyDKJrIElRsLhooyE0n4kY6oqNDbC3VMDTru+fH+J9FoIbnRWxeada2xWdOzrs2SjQApNICNkPi97ZM6hKsGGN60qSeWFGinvZkcNvRrDMDrOwRox17qpvrCbZdSTWP44b2pnUckdtCghbV7AXdNGKorkaConVXQDN5lxwko+yYxgIkvl1Ux1wak1xgnNRvbtGRGBRKfNyYDr/BGpm2GkIycBefH4YxQzI7KyDP8ud2ZsT3a1mluUE816Uo0LpAa3S84oHiDVr7+DPz4QaSZEcfNZ+y9IenUi/L3MoCXbifUZ1WKWryf+RQUFtBZeQTAfuSJATk8v2FJ+fnhivrgwFqs+8dooGUPkGw0eksXvsCdfjv7fcxFp4PulYrByGvuxz0/Fmb/xP0OJfpQsUIfUQSnff7FrEvaJWXQ8dTwhkDzC0HfZ8NEqJuV4RYGHtc1MBeKliDCkDjrkiuycxOi2LzKFbF5jzWizp9eU8Sx8mW3w73qay7jCDzP0wm2gDCdxyYNH5OlLMiAIhBir/tol0tN1ZUOsF206W4Aajn3TAI6WgJHTg2SCz+8hIUEAX1BP1oOvg6z5JOZPZwpJzU8McDRHYdh4Fhbxmc2bp80VMB9ynDQyn9sVdi733aPrezRnMV/8Sz/H5him8eIahPdyb8rfeKndCPUBqR D4FlnH/O m6nuqCnYSyumm0L0jaQBr+ObCWxJYV+i7hitM+f+BpMG4LFt/rVZlELoIFMok5M+kCgSnEecfwmhTQX8S4EQARbSS0Jv7AHnw71CV2YP7nN3uAKN0Pi2bXYeCaeRywfEKjaI7E+aDBTSgTXq3prfkfE+iaWLhdhOuMDbbs6Xeo9HAfKdKAiVUPdQnKpcnD7E+4PAdS4nq/A2HtXqUUs966RcbXgXcaO1R5W1xS4n2NmRh0367JNTVoiqI0PNCgBpEZ3yHAGK5rEiueByp4dMfJIolo4z0wz5uZNv1aSJfB2PulNYVSstRZgo5phKP0Sqdf2c9FGSXLZhosWJvGRakOb0EY5Qb9T05vs6F0rtCn5LIVK8K87DCfpAwgYMzwyIXQe42bz9oaTehiRo= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi On Tue, Jul 18, 2023, at 9:03 PM, Jeff Xu wrote: > Hi David > > Thanks email and patch for discussion. > > On Fri, Jul 14, 2023 at 4:48=E2=80=AFAM David Rheinsberg wrote: >> >> Add a new flag for memfd_create() called MFD_NOEXEC, which has the >> opposite effect as MFD_EXEC (i.e., it strips the executable bits from >> the inode file mode). >> > I previously thought about having the symmetric flags, such as > MFD_NOEXEC/MFD_EXEC/MFD_NOEXEC_SEAL/MFD_EXEC_SEAL, but decided against > it. The app shall decide beforehand what the memfd is created for, if > it is no-executable, then it should be sealed, such that it can't be > chmod to enable "X" bit. My point is, an application might decide to *not* seal a property, becau= se it knows it has to change it later on. But it might still want to dis= able the executable bit initially, so to avoid having executable pages a= round that can be exploited. >> The default mode for memfd_create() has historically been to use 0777= as >> file modes. The new `MFD_EXEC` flag has now made this explicit, paving >> the way to reduce the default to 0666 and thus dropping the executable >> bits for security reasons. Additionally, the `MFD_NOEXEC_SEAL` flag h= as >> been added which allows this without changing the default right now. >> >> Unfortunately, `MFD_NOEXEC_SEAL` enforces `MFD_ALLOW_SEALING` and >> `F_SEAL_EXEC`, with no alternatives available. This leads to multiple >> issues: >> >> * Applications that do not want to allow sealing either have to use >> `MFD_EXEC` (which we don't want, unless they actually need the >> executable bits), or they must add `F_SEAL_SEAL` immediately on >> return of memfd_create(2) with `MFD_NOEXEC_SEAL`, since this >> implicitly enables sealing. >> >> Note that we explicitly added `MFD_ALLOW_SEALING` when creating >> memfd_create(2), because enabling seals on existing users of shmem >> without them knowing about it can easily introduce DoS scenarios. > > The application that doesn't want MFD_NOEXEC_SEAL can use MFD_EXEC, > the kernel won't add MFD_ALLOW_SEALING implicitly. MFD_EXEC makes the > kernel behave the same as before, this is also why sysctl > vm.memfd_noexec=3D0 can work seamlessly. > >> It >> is unclear why `MFD_NOEXEC_SEAL` was designed to enable seals, and >> this is especially dangerous with `MEMFD_NOEXEC_SCOPE_NOEXEC_SEAL` >> set via sysctl, since it will silently enable sealing on every mem= fd >> created. >> > Without sealing, chmod(2) can modify the mfd to be executable, that is > the consideration that MFD_NOEXEC is not provided as an option. > Indeed, current design is "biased" towards promoting MFD_NOEXEC_SEAL > as the safest approach, and try to avoid the pitfall that dev > accidently uses "MFD_NOEXEC" without realizing it can still be > chmod(). I think I didn't get my point across. Imagine an application that does *= NOT* use sealing, but uses memfds. This application shares memfds with u= ntrusted clients, and does this in a safe way (SIGBUS protected). Everyt= hing works fine, unless someone decides to enable `vm.memfd_noexec=3D2`.= Suddenly, the memfd will have sealing enabled *without* the application= ever requesting this. Now any untrusted client that got the memfd can a= dd seals to the memfd, even though the creator of the memfd did not enab= le sealing. This client can now seal WRITES on the memfd, even though it= really should not be able to do that. (This is not an hypothetical setup, we have such setups for data sharing= already) Thus, setting the security-option `memfd_noexec` *breaks* applications, = because it enables sealing. If `MFD_NOEXEC_SEAL` would *not* imply `MFD_= ALLOW_SEALING`, this would not be an issue. IOW, why does =C2=B4MFD_NOEX= EC_SEAL` clear `F_SEAL_SEAL` even if `MFD_ALLOW_SEALING` is not set? >> * Applications that do not want `MFD_EXEC`, but rely on >> `F_GET_SEALS` to *NOT* return `F_SEAL_EXEC` have no way of achievi= ng >> this other than using `MFD_EXEC` and clearing the executable bits >> manually via fchmod(2). Using `MFD_NOEXEC_SEAL` will set >> `F_SEAL_EXEC` and thus break existing code that hard-codes the >> seal-set. >> >> This is already an issue when sending log-memfds to systemd-journa= ld >> today, which verifies the seal-set of the received FD and fails if >> unknown seals are set. Hence, you have to use `MFD_EXEC` when >> creating memfds for this purpose, even though you really do not ne= ed >> the executable bits set. >> >> * Applications that want to enable the executable bit later on, >> currently have no way to create the memfd without it. They have to >> clear the bits immediately after creating it via fchmod(2), or just >> leave them set. >> > Is it OK to do what you want in two steps ? What is the concern there = ? i.e. > memfd_create(MFD_EXEC), then chmod to remove the "X" bit. > > I imagine this is probably already what the application does for > creating no-executable mfd before my patch, i.e.: > memfd_create(), then chmod() to remove "X" to remove "X" bit. Yes, correct, this is not a technical issue, but rather an API issue. I = don't think most memfd-users are aware that their inode has the executab= le bit set, and they likely don't want it. But for backwards-compatibili= ty reasons (as noted above), they cannot use `MFD_NOEXEC_SEAL`. Hence, w= e have to make them explicitly request an executable memfd via `MFD_EXEC= ` now, even though they clearly do not want this. And then we have to ad= d code to drop the executable immediately afterwards. It don't understand why we don't add out `MFD_NOEXEC` and thus make it a= lot easier to patch existing applications? And we make it explicit that= these applications don't care for the executable-bit, rather than forci= ng them to request the executable bit just to drop it immediately. The downside of `MFD_NOEXEC` is that it might be picked over `MFD_NOEXEC= _SEAL` by uneducated users, thus reduce security. But right now, the alt= ernative is that existing code picks `MFD_EXEC` instead and never clears= the executable bit, because it is a hassle to do so. Or is there another reason *not* to include `MFD_NOEXEC`? I am not sure = I understand fully why you fight it so vehemently? Thanks David