From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 90326C27C4F for ; Tue, 11 Jun 2024 03:33:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 278446B0082; Mon, 10 Jun 2024 23:33:17 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2246E6B008A; Mon, 10 Jun 2024 23:33:17 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0ECFF6B0096; Mon, 10 Jun 2024 23:33:17 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id E62FD6B0082 for ; Mon, 10 Jun 2024 23:33:16 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 5E91D1402D9 for ; Tue, 11 Jun 2024 03:33:16 +0000 (UTC) X-FDA: 82217187192.22.3D01F51 Received: from mail-wm1-f41.google.com (mail-wm1-f41.google.com [209.85.128.41]) by imf13.hostedemail.com (Postfix) with ESMTP id 7B06120002 for ; Tue, 11 Jun 2024 03:33:14 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=FEz0+5qj; spf=pass (imf13.hostedemail.com: domain of jeffxu@google.com designates 209.85.128.41 as permitted sender) smtp.mailfrom=jeffxu@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1718076794; a=rsa-sha256; cv=none; b=gXWF7TmEL8viLUcDD9oYfXbeuHG9HevvzMWvBhnd63KjpPsi7CzAzlHlv/UmcpHyXbFw6F FOwntBr7/ZSIabTOFk/qXH3PBcFz8219GB2LZrWNvh9H2UrxWUjnTGtPfIZJeRgMGbkQwC x+7MjYRL3xsINmDFg+91hkdNMHl6U8I= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=FEz0+5qj; spf=pass (imf13.hostedemail.com: domain of jeffxu@google.com designates 209.85.128.41 as permitted sender) smtp.mailfrom=jeffxu@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1718076794; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ePFKPNMhKZdDDrvKMpPDc1jN82wnN2cdCBgWMIF8/6g=; b=qm6QxSfBizYLrNNkwahL1BF0OAfdsPVUsC9lja5O3DE3QogMTik6VUajJA8JgZFB1Kf6kv 7mC1NNsrYpLkaRG896KWP8RvPSR13oVnJpEMZ8GK9tc3TdjgMc1LpQY5CMh/lLEOKrfnI3 kLgtEb69apaUpnYqRoZHAD+UAbmI8wU= Received: by mail-wm1-f41.google.com with SMTP id 5b1f17b1804b1-42179480819so22025e9.1 for ; Mon, 10 Jun 2024 20:33:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1718076793; x=1718681593; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=ePFKPNMhKZdDDrvKMpPDc1jN82wnN2cdCBgWMIF8/6g=; b=FEz0+5qjVSt4kcjXoDGHwEFZCtv7Cq08sXHxRiHv90JLeTfFcxtvMujmuJs/W4UhIq IVtyb5roW4u2/YRnjI2w3gRa9XavV17mch/dacrAeVqWapDUiFliu6soBHIr/X+T2J+M fPjQ0Zsl5ogJsbgRdKq2n5cLPAjfv0CD4gF07JyHoYPntD3bzrJv2ThQ5OjmIGhBCKeT 9Oc6IsSK1ZUY+rjhPc1e1pcMIN1FahJWeTxWs4hFMcVSQ8KmNF+72D/oUFqxzVDr9vb/ Hwk02Qft9SC9xYsTdFs4koL6CisTd/zqE2nheox8WsrvhaTWdi8COH/Y0trkkOPl4blQ sLmQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1718076793; x=1718681593; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ePFKPNMhKZdDDrvKMpPDc1jN82wnN2cdCBgWMIF8/6g=; b=kxwOASm/FO1mcKwjn2q/sQvdYQJYHWo/dUVJk6G9PbwqZ0VSFZ6AA1PcbSstYPdZmW F+8XVrKC7wpBtXKdcInda433KxnRrShonEoEif1rnjSc62gYzZLLl05M2gVK6YAlLo/9 RdFG+ugUUiSGN28PoA6wcvwTdT24GsChIT9c6OsPQIOsx9lDAi9OEcNA0dogYhMYuViU ZutW4m6gsLtqDHv6Fd98WG518Xs6DH6FwtND2xrcLnF7FrJ06sCMjN1aDS2wV0XBlnog jx7M1/+tnYxs6bolU8UpY/8yUh2XM5TIK8/i60Qyt1mNLuw8P3jUOUwzHbX0kPXBWHTT EKMg== X-Forwarded-Encrypted: i=1; AJvYcCXDsWQXJ00jCxNvzqWyfb/g6JHXholUCsHbTpq7DgWKkwYX3IikvnIw5BOr8FIQt5coWy1IHlV4VxE9/ctZCX8nsjI= X-Gm-Message-State: AOJu0YzpFaBRonprxxFmz7SchUrimP46gReRXRQrw+5mUEXBs5fUmxS3 nQKo8o7iEfUP5APd3O6kIYfHKOdDWM7l5wyHo3VDlRSTMaSOEt1h0FabCcPo1scSh5zzDSoHKtq ThMBzFPx/oemPHhmnSyjG+6XQd3r8xNazUNls X-Google-Smtp-Source: AGHT+IEUcxp+C1A/Jggd8QcKnMypQqdR9xeaYlLV+2Z4rrd05nHiQ8EugdpV2M6uIdB5pd7qUtCWqPBVtG1WDOnfl0s= X-Received: by 2002:a05:600c:1f12:b0:421:89d4:2928 with SMTP id 5b1f17b1804b1-422558cfce1mr474365e9.7.1718076792552; Mon, 10 Jun 2024 20:33:12 -0700 (PDT) MIME-Version: 1.0 References: <20240607203543.2151433-1-jeffxu@google.com> <20240607203543.2151433-2-jeffxu@google.com> <0988dfae-69d0-4fbf-b145-15f6e853cbcc@infradead.org> In-Reply-To: <0988dfae-69d0-4fbf-b145-15f6e853cbcc@infradead.org> From: Jeff Xu Date: Mon, 10 Jun 2024 20:32:34 -0700 Message-ID: Subject: Re: [PATCH v1 1/1] mm/memfd: add documentation for MFD_NOEXEC_SEAL MFD_EXEC To: Randy Dunlap Cc: jeffxu@chromium.org, akpm@linux-foundation.org, cyphar@cyphar.com, david@readahead.eu, dmitry.torokhov@gmail.com, dverkamp@chromium.org, hughd@google.com, jorgelo@chromium.org, keescook@chromium.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, pobrn@protonmail.com, skhan@linuxfoundation.org, stable@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: wkf4xreaepfaj3n1k5n7dte4j48mptsx X-Rspamd-Queue-Id: 7B06120002 X-Rspam-User: X-Rspamd-Server: rspam06 X-HE-Tag: 1718076794-609541 X-HE-Meta: U2FsdGVkX1+TZgNrz26vyq51Gp2Aq2O3Biy2lqYRQ92KPhYA671Wvz38lS7QOZfwZJMTaEzHql6Ft4FbA+7HaU91h1/acLW26vfE6juZ+7z9O1EuQrnyAvkGvyDGDs/Q6N+xpgVVoDCn25hOms9s54wYfsgSeKp4YHAAEWgDubbTBmUrFDVL84lMIu3Xmoq4e0C5TI5s73k80tiiC2wSL7Iq6SZYT9tOI6+oaEl52QWXQoSUkX8vNRMmyV5isatlwbyw1N/KR+HNuFIOxHoqVmPC9WRLPsyMelX09iRaNHfzo4rtWxGn7w8Qlh4ZD1mGYBLquz/tPQzFAD94YIIZ5j5VLg2Jjcahnbf2K7dZQG+1kNal01SKr8gLus1KbI3mVmzr2Q1yfhYMWgB9QIGqUyQjn+sE0qC+SAnU/K/v75zqr2a594M3ARO9uZY5zUL1Yttqep8RNwEV59NHNgB+N9CMqoUmahUbh15H5g+bFCATK91og7yqZDBshOkQAaQTw5bX8gQimN+Zq+TXflaXbTAxh+xkepa3TMbrrQSq9UnV1aeDBDBrI3FTV3IOn4o3SbmbCUPwpQuZffg+H8hSEacz9iwivcN7D+1CCLm0lZURYzqA9FMchutyxwNz1fK5YpzCiFQYbyVj+v/3kkKXpKBpSlk56yozhbTFDVq1jdg1GKLf6egzalBeetHBfg3yK4KkDNTVEI+1LAF44I00wubd6G1Kdy8eTmkx+Dlbewbu4IHUTV3eLqiXShEX52bHydzqNbjij6qMztMllBMRLKG8zi7A1e0QnucAF227XaTjtx0JxggHOi2EaMWvmDEMOiX638flN/GZOeC3WkZu25LOroxkNbeaxZs7TX5NT576xDir73SI6epJE42vMQPemridYJBupEvmuVO7MejDarrRvvpth3QMPXTU6yrMTcmjv5JWyZakxX+cTMWdDGpLlmXjfYpOECm3ADw1Vs+ p+tFJtKq WhL0PwGJeozL1JWcev+oCMfiPsCynmA9cbIuQIKpR/ii/0a24RLn6sWQ9yZ1F4868NygouPK4kge9u+FyluEmCGQQi93vVIFmk+E5zl+YIcGg3axeV4h2HG8OuzTexS7s1l8Xw48aqezUA/ybf9kOlhPPomGa5NX+d0vmglL7ar+Z6nWoetGSTMzfRoR50d8jPCURDg0HfJBFdbaDWQLNhcw1C4QIvIZ33MMicyyTz7KG59UBN3nFHWuUi2Sq4UTFZlam3JP+ScN5qhvniS1gZT1sURQHhH21lUOICo/pIIaLIvSn3lx5+72Z1U5RrEcmJwivD2YrsehlY4mFlGvDVjZZAGfK6ARQCNp51dahvPyeGuEUgg5Ha8AWY6eGK93YN3G48/4U22GxQQOem31alZF/Rlut2Inzy+Fk+Dl+4HyWEYNaffNzJ10xgzDqmBNidURBzReIoXrONKNAUc3DfBZ/28sPr4onFANsv3Rbv3MTyU9MldE9x+2Eu5WeklgjmmuXyngj4TWWstu6F48NXQAFjhUG+3Q/5oQorYCdcv1q0e8= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi On Mon, Jun 10, 2024 at 7:20=E2=80=AFPM Randy Dunlap wrote: > > Hi-- > > On 6/7/24 1:35 PM, jeffxu@chromium.org wrote: > > From: Jeff Xu > > > > Add documentation for memfd_create flags: FMD_NOEXEC_SEAL > > s/FMD/MFD/ > > > and MFD_EXEC > > > > Signed-off-by: Jeff Xu > > --- > > Documentation/userspace-api/index.rst | 1 + > > Documentation/userspace-api/mfd_noexec.rst | 86 ++++++++++++++++++++++ > > 2 files changed, 87 insertions(+) > > create mode 100644 Documentation/userspace-api/mfd_noexec.rst > > > > diff --git a/Documentation/userspace-api/index.rst b/Documentation/user= space-api/index.rst > > index 5926115ec0ed..8a251d71fa6e 100644 > > --- a/Documentation/userspace-api/index.rst > > +++ b/Documentation/userspace-api/index.rst > > @@ -32,6 +32,7 @@ Security-related interfaces > > seccomp_filter > > landlock > > lsm > > + mfd_noexec > > spec_ctrl > > tee > > > > diff --git a/Documentation/userspace-api/mfd_noexec.rst b/Documentation= /userspace-api/mfd_noexec.rst > > new file mode 100644 > > index 000000000000..0d2c840f37e1 > > --- /dev/null > > +++ b/Documentation/userspace-api/mfd_noexec.rst > > @@ -0,0 +1,86 @@ > > +.. SPDX-License-Identifier: GPL-2.0 > > + > > +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > +Introduction of non executable mfd > > non-executable mfd > > > +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > +:Author: > > + Daniel Verkamp > > + Jeff Xu > > + > > +:Contributor: > > + Aleksa Sarai > > + > > +Since Linux introduced the memfd feature, memfd have always had their > > memfds > i.e., plural > > > +execute bit set, and the memfd_create() syscall doesn't allow setting > > +it differently. > > + > > +However, in a secure by default system, such as ChromeOS, (where all > > secure-by-default > > > +executables should come from the rootfs, which is protected by Verifie= d > > +boot), this executable nature of memfd opens a door for NoExec bypass > > +and enables =E2=80=9Cconfused deputy attack=E2=80=9D. E.g, in VRP bug= [1]: cros_vm > > +process created a memfd to share the content with an external process, > > +however the memfd is overwritten and used for executing arbitrary code > > +and root escalation. [2] lists more VRP in this kind. > > of this kind. > > > + > > +On the other hand, executable memfd has its legit use, runc uses memfd= =E2=80=99s > > use: > > > +seal and executable feature to copy the contents of the binary then > > +execute them, for such system, we need a solution to differentiate run= c's > > them. For such a system, > > > +use of executable memfds and an attacker's [3]. > > + > > +To address those above. > > above: > > > + - Let memfd_create() set X bit at creation time. > > + - Let memfd be sealed for modifying X bit when NX is set. > > + - A new pid namespace sysctl: vm.memfd_noexec to help applications to > > - Add a new applications in > > > + migrating and enforcing non-executable MFD. > > + > > +User API > > +=3D=3D=3D=3D=3D=3D=3D=3D > > +``int memfd_create(const char *name, unsigned int flags)`` > > + > > +``MFD_NOEXEC_SEAL`` > > + When MFD_NOEXEC_SEAL bit is set in the ``flags``, memfd is create= d > > + with NX. F_SEAL_EXEC is set and the memfd can't be modified to > > + add X later. MFD_ALLOW_SEALING is also implied. > > + This is the most common case for the application to use memfd. > > + > > +``MFD_EXEC`` > > + When MFD_EXEC bit is set in the ``flags``, memfd is created with = X. > > + > > +Note: > > + ``MFD_NOEXEC_SEAL`` implies ``MFD_ALLOW_SEALING``. In case that > > + app doesn't want sealing, it can add F_SEAL_SEAL after creation. > > an app > > > + > > + > > +Sysctl: > > +=3D=3D=3D=3D=3D=3D=3D=3D > > +``pid namespaced sysctl vm.memfd_noexec`` > > + > > +The new pid namespaced sysctl vm.memfd_noexec has 3 values: > > + > > + - 0: MEMFD_NOEXEC_SCOPE_EXEC > > + memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL acts like > > + MFD_EXEC was set. > > + > > + - 1: MEMFD_NOEXEC_SCOPE_NOEXEC_SEAL > > + memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL acts like > > + MFD_NOEXEC_SEAL was set. > > + > > + - 2: MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED > > + memfd_create() without MFD_NOEXEC_SEAL will be rejected. > > + > > +The sysctl allows finer control of memfd_create for old-software that > > old software > > > +doesn't set the executable bit, for example, a container with > > bit; > > > +vm.memfd_noexec=3D1 means the old-software will create non-executable = memfd > > old software > > > +by default while new-software can create executable memfd by setting > > new software > > > +MFD_EXEC. > > + > > +The value of vm.memfd_noexec is passed to child namespace at creation > > +time, in addition, the setting is hierarchical, i.e. during memfd_crea= te, > > time. In addition, > Updated in V2. Thanks! -Jeff > > +we will search from current ns to root ns and use the most restrictive > > +setting. > > + > > +[1] https://crbug.com/1305267 > > + > > +[2] https://bugs.chromium.org/p/chromium/issues/list?q=3Dtype%3Dbug-se= curity%20memfd%20escalation&can=3D1 > > + > > +[3] https://lwn.net/Articles/781013/ > > -- > ~Randy