From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 05F82C27C4F for ; Tue, 11 Jun 2024 02:21:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 85D2E6B0089; Mon, 10 Jun 2024 22:21:02 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 80B326B008A; Mon, 10 Jun 2024 22:21:02 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6849E6B008C; Mon, 10 Jun 2024 22:21:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 47A576B0089 for ; Mon, 10 Jun 2024 22:21:02 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id E39DA4120D for ; Tue, 11 Jun 2024 02:21:01 +0000 (UTC) X-FDA: 82217005122.28.3586FAF Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) by imf18.hostedemail.com (Postfix) with ESMTP id B99C01C0025 for ; Tue, 11 Jun 2024 02:20:58 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=infradead.org header.s=bombadil.20210309 header.b=NjeOmRux; spf=none (imf18.hostedemail.com: domain of rdunlap@infradead.org has no SPF policy when checking 198.137.202.133) smtp.mailfrom=rdunlap@infradead.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1718072459; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=d2cLskHqQK8hk38s4/1iX9SZkTIRIUV1aAaRQUth+3s=; b=GKjCg+6NPPkLO+gHdDaW3f07yOVsmPuez769IlPb6j03XIGgO/B4hIADNvGm3bWdqXMRNm NJ3QRGuCApE98UiDp4zaNlRpE68jKGHOzDoVb+gp9gCsVf1jCRpfmTy5HVT8f7QRw+DS/5 IqdInVxKrocS2QKa3Y9jouwriDdL83I= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1718072459; a=rsa-sha256; cv=none; b=B3a7K6qp4KY3XwpQgkbxK696kR6orZVJkTJhHatJDvjgBVfs9QygUEx9CtMNc1YQ9qtxUH oVjZdrucWKze7L+KSNrEOU+DR5YrEbgxTiIOwMzhDEUn7rkRqOtbX+QWfMCOTYShDQVIT0 jkYSXgtDYtBxnon05trmxsAsemUqPHg= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=infradead.org header.s=bombadil.20210309 header.b=NjeOmRux; spf=none (imf18.hostedemail.com: domain of rdunlap@infradead.org has no SPF policy when checking 198.137.202.133) smtp.mailfrom=rdunlap@infradead.org; dmarc=none DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: Content-Type:In-Reply-To:From:References:Cc:To:Subject:MIME-Version:Date: Message-ID:Sender:Reply-To:Content-ID:Content-Description; bh=d2cLskHqQK8hk38s4/1iX9SZkTIRIUV1aAaRQUth+3s=; b=NjeOmRuxVwrN8/kyK0fzcbla/G gjIVdM91KO0RO2HMnkLvsGdSplTE4NWKp/Zhtxnp/p3r6Wz+IOmusIr+z/iY0Lr7Te7xrTsEglxRF yzQVbMjWtDb3qZLjxt/PhBdJWcpSHAcgBJsIqGCJ0r4KipCq44vOuYeQa5FFH3Vj80hs26sCv8Z+M +i1jQLamzvjPcoxIBOKlNhumX61Mmpk26pWgDffWFH/HpERgm2xjAy2kG18mA7Kz/nFm+ZhuCVpcw trLo88MRF1Pv+0JwdwKJ+5G3N+e+owWfVtBUcAgONcWn9nFZtu1wBVT0Og+W1Sv9u6+mIS9shpmui ClLZNo9w==; Received: from [50.53.4.147] (helo=[192.168.254.15]) by bombadil.infradead.org with esmtpsa (Exim 4.97.1 #2 (Red Hat Linux)) id 1sGr7x-000000078R6-2y2o; Tue, 11 Jun 2024 02:20:49 +0000 Message-ID: <0988dfae-69d0-4fbf-b145-15f6e853cbcc@infradead.org> Date: Mon, 10 Jun 2024 19:20:48 -0700 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v1 1/1] mm/memfd: add documentation for MFD_NOEXEC_SEAL MFD_EXEC To: jeffxu@chromium.org Cc: akpm@linux-foundation.org, cyphar@cyphar.com, david@readahead.eu, dmitry.torokhov@gmail.com, dverkamp@chromium.org, hughd@google.com, jeffxu@google.com, jorgelo@chromium.org, keescook@chromium.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, pobrn@protonmail.com, skhan@linuxfoundation.org, stable@vger.kernel.org References: <20240607203543.2151433-1-jeffxu@google.com> <20240607203543.2151433-2-jeffxu@google.com> Content-Language: en-US From: Randy Dunlap In-Reply-To: <20240607203543.2151433-2-jeffxu@google.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: B99C01C0025 X-Rspam-User: X-Rspamd-Server: rspam08 X-Stat-Signature: w3nqx5kro3kuo99uwi4xt3qb6aajiodq X-HE-Tag: 1718072458-754735 X-HE-Meta: U2FsdGVkX1/UOWyHc4Xyw06qZg3W73eZuVomUqekB7Iypa1L2CGf2HIIZVtMoKA9rruH5SEeyHHUUBpiqEaf5ZzYvXqkhlBKXsBLJDMZY4l0KUbNojnzaPCxP6VL0Y84QDcUp2OyNz6qIXIc+XfYDAj31ou5thaE+pfAhlRh1ShetGuQtwIQBZKF7jtq5F1HxclT0GzctXDuXDrRObpmWBYMrT/GUhRGyLIdxMAZ7THYh9s8sGQbiMQKjEnNu3URzc0aitpzrhFFp09xhfAklE0vTeOC5Ln7A62pjcH4lIHNyFGMfc0+jc7to1rIa0sfepBojJwB3bRQ9N+dq0wnnaFKJcELil701f9RLsH9nUFvYO6f0pH/iatBglhqG6QhtuQvop33DDOYj4fvcrP/atGohrP4dlXiUW84uaibU2ATQvBrM3bMjzUI/yM7Tmer8uiMRRDEAuCCoK9S5K/3YL44wlZIjuLY8edDdGJNJPeN1EiBFprAD/uWEbYjq9jwpRVLnjDajx1qIzoiH7N5CkamuLBy9O9Fy2Te0Wp0ifnB9Y2z8mMbADz3C/sEJlw/xnjQxBrzl6mMiOPq10VEFrmQhEvCFfgmOXzWJ8HzEdZ2TwPKawbQ7LICGboL06GALQ/2H0Tk64bR7u5Vp40EtSuAf1uv+I2eRQKc/pfzwM2GldxgH/Y7piQNfRIKTQm1T0wkzwogk1bMmMSLGcKY3TuqnvAYLi3bK4ZXU+L5IF9DwV6CvcK78PwIw2QVYznc+yBaewUng+ffcZZ/lOuvRFRz5zpU71xYPg687U1CS58/TrY/YyIc2ucG7sSvMVE3Wfq/U1AHkvNIChczm1H3Gmcc8RKAFpdLC2Jb8mLZA9qfL43JKVNgx2lxFBbuS96g4ydKxVI+1LiWKOMhHkAd4YvFaOIp6bKSS8ExAh/iRkPL8tlHIdfORWKF3N9Otuu6O3Q7sOIPVtsZUapSbLC RTzMUcSp iaJs4AfJhNr8+K2/qqsLn1MCoCUtJSHsQET6p/qguS8UZLKi6qFhmnWLoFetar29Pe35B5UwA98d4tAfyHtcSHF8nhjwCGdcK1VUMDgi00GKEulr6C7Zx4dFM9cdSqmNeJ6y65s+tOHaptLiY41XWINISzqU5qSexpe2zbVCvyKsSKZTrH5XXeXwkFaYPNWvQp8Hv+S4WGW4eSUg/ND6cSgM90UAy75aJSQ47YDicJ4HjwHv36k/eJZwfYI5/+THtSqUxxWVwL41PcfOK6QsgsuwY+om9QW3Hep/bgZYBalWRDOzmaKiApg921iaqvXZdgvQgUpaou/EsepEhhuYHl8eH+1MMEdWA1eIV1xc8smXCffVwFiNHPa9dmPhelAfQD3uSQSIgcTF0d5PqB7wXTzroLoUzoE+qoBy79xuv1KkGdEOmek30Y1q+6LejhDzETaNKaZ3iSAFeM5CfjcVcImgXkVIprBc21q+wwdUzyeTJOghlcn1oGEkGvuyMvkaRjgc769lICaP2g2r3eFHNMbGWT2sUKMfw510cQAL1JCNwO5JdZRD7ZBveVcKxLGRMIWsRlLhzIVpcxaF8NF7pzB9lna8mvx6Jj/5G X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi-- On 6/7/24 1:35 PM, jeffxu@chromium.org wrote: > From: Jeff Xu > > Add documentation for memfd_create flags: FMD_NOEXEC_SEAL s/FMD/MFD/ > and MFD_EXEC > > Signed-off-by: Jeff Xu > --- > Documentation/userspace-api/index.rst | 1 + > Documentation/userspace-api/mfd_noexec.rst | 86 ++++++++++++++++++++++ > 2 files changed, 87 insertions(+) > create mode 100644 Documentation/userspace-api/mfd_noexec.rst > > diff --git a/Documentation/userspace-api/index.rst b/Documentation/userspace-api/index.rst > index 5926115ec0ed..8a251d71fa6e 100644 > --- a/Documentation/userspace-api/index.rst > +++ b/Documentation/userspace-api/index.rst > @@ -32,6 +32,7 @@ Security-related interfaces > seccomp_filter > landlock > lsm > + mfd_noexec > spec_ctrl > tee > > diff --git a/Documentation/userspace-api/mfd_noexec.rst b/Documentation/userspace-api/mfd_noexec.rst > new file mode 100644 > index 000000000000..0d2c840f37e1 > --- /dev/null > +++ b/Documentation/userspace-api/mfd_noexec.rst > @@ -0,0 +1,86 @@ > +.. SPDX-License-Identifier: GPL-2.0 > + > +================================== > +Introduction of non executable mfd non-executable mfd > +================================== > +:Author: > + Daniel Verkamp > + Jeff Xu > + > +:Contributor: > + Aleksa Sarai > + > +Since Linux introduced the memfd feature, memfd have always had their memfds i.e., plural > +execute bit set, and the memfd_create() syscall doesn't allow setting > +it differently. > + > +However, in a secure by default system, such as ChromeOS, (where all secure-by-default > +executables should come from the rootfs, which is protected by Verified > +boot), this executable nature of memfd opens a door for NoExec bypass > +and enables “confused deputy attack”. E.g, in VRP bug [1]: cros_vm > +process created a memfd to share the content with an external process, > +however the memfd is overwritten and used for executing arbitrary code > +and root escalation. [2] lists more VRP in this kind. of this kind. > + > +On the other hand, executable memfd has its legit use, runc uses memfd’s use: > +seal and executable feature to copy the contents of the binary then > +execute them, for such system, we need a solution to differentiate runc's them. For such a system, > +use of executable memfds and an attacker's [3]. > + > +To address those above. above: > + - Let memfd_create() set X bit at creation time. > + - Let memfd be sealed for modifying X bit when NX is set. > + - A new pid namespace sysctl: vm.memfd_noexec to help applications to - Add a new applications in > + migrating and enforcing non-executable MFD. > + > +User API > +======== > +``int memfd_create(const char *name, unsigned int flags)`` > + > +``MFD_NOEXEC_SEAL`` > + When MFD_NOEXEC_SEAL bit is set in the ``flags``, memfd is created > + with NX. F_SEAL_EXEC is set and the memfd can't be modified to > + add X later. MFD_ALLOW_SEALING is also implied. > + This is the most common case for the application to use memfd. > + > +``MFD_EXEC`` > + When MFD_EXEC bit is set in the ``flags``, memfd is created with X. > + > +Note: > + ``MFD_NOEXEC_SEAL`` implies ``MFD_ALLOW_SEALING``. In case that > + app doesn't want sealing, it can add F_SEAL_SEAL after creation. an app > + > + > +Sysctl: > +======== > +``pid namespaced sysctl vm.memfd_noexec`` > + > +The new pid namespaced sysctl vm.memfd_noexec has 3 values: > + > + - 0: MEMFD_NOEXEC_SCOPE_EXEC > + memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL acts like > + MFD_EXEC was set. > + > + - 1: MEMFD_NOEXEC_SCOPE_NOEXEC_SEAL > + memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL acts like > + MFD_NOEXEC_SEAL was set. > + > + - 2: MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED > + memfd_create() without MFD_NOEXEC_SEAL will be rejected. > + > +The sysctl allows finer control of memfd_create for old-software that old software > +doesn't set the executable bit, for example, a container with bit; > +vm.memfd_noexec=1 means the old-software will create non-executable memfd old software > +by default while new-software can create executable memfd by setting new software > +MFD_EXEC. > + > +The value of vm.memfd_noexec is passed to child namespace at creation > +time, in addition, the setting is hierarchical, i.e. during memfd_create, time. In addition, > +we will search from current ns to root ns and use the most restrictive > +setting. > + > +[1] https://crbug.com/1305267 > + > +[2] https://bugs.chromium.org/p/chromium/issues/list?q=type%3Dbug-security%20memfd%20escalation&can=1 > + > +[3] https://lwn.net/Articles/781013/ -- ~Randy