[RFC PATCH 2/3] memfd: remove racheting feature from vm.memfd_noexec

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Aleksa Sarai <cyphar@cyphar.com>
To: Andrew Morton <akpm@linux-foundation.org>,
	Jeff Xu <jeffxu@google.com>, Aleksa Sarai <cyphar@cyphar.com>,
	YueHaibing <yuehaibing@huawei.com>,
	Luis Chamberlain <mcgrof@kernel.org>,
	Kees Cook <keescook@chromium.org>,
	Daniel Verkamp <dverkamp@chromium.org>
Cc: linux-mm@kvack.org, Dominique Martinet <asmadeus@codewreck.org>,
	Christian Brauner <brauner@kernel.org>,
	stable@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: [RFC PATCH 2/3] memfd: remove racheting feature from vm.memfd_noexec
Date: Fri, 14 Jul 2023 00:33:47 +1000	[thread overview]
Message-ID: <20230713143406.14342-3-cyphar@cyphar.com> (raw)
In-Reply-To: <20230713143406.14342-1-cyphar@cyphar.com>

This sysctl has the very unusal behaviour of not allowing any user (even
CAP_SYS_ADMIN) to reduce the restriction setting, meaning that if you
were to set this sysctl to a more restrictive option in the host pidns
you would need to reboot your machine in order to reset it.

The justification given in [1] is that this is a security feature and
thus it should not be possible to disable. Aside from the fact that we
have plenty of security-related sysctls that can be disabled after being
enabled (fs.protected_symlinks for instance), the protection provided by
the sysctl is to stop users from being able to create a binary and then
execute it. A user with CAP_SYS_ADMIN can trivially do this without
memfd_create(2):

  % cat mount-memfd.c
  #include <fcntl.h>
  #include <string.h>
  #include <stdio.h>
  #include <stdlib.h>
  #include <unistd.h>
  #include <linux/mount.h>

  #define SHELLCODE "#!/bin/echo this file was executed from this totally private tmpfs:"

  int main(void)
  {
  	int fsfd = fsopen("tmpfs", FSOPEN_CLOEXEC);
  	assert(fsfd >= 0);
  	assert(!fsconfig(fsfd, FSCONFIG_CMD_CREATE, NULL, NULL, 2));

  	int dfd = fsmount(fsfd, FSMOUNT_CLOEXEC, 0);
  	assert(dfd >= 0);

  	int execfd = openat(dfd, "exe", O_CREAT | O_RDWR | O_CLOEXEC, 0782);
  	assert(execfd >= 0);
  	assert(write(execfd, SHELLCODE, strlen(SHELLCODE)) == strlen(SHELLCODE));
  	assert(!close(execfd));

  	char *execpath = NULL;
  	char *argv[] = { "bad-exe", NULL }, *envp[] = { NULL };
  	execfd = openat(dfd, "exe", O_PATH | O_CLOEXEC);
  	assert(execfd >= 0);
  	assert(asprintf(&execpath, "/proc/self/fd/%d", execfd) > 0);
  	assert(!execve(execpath, argv, envp));
  }
  % ./mount-memfd
  this file was executed from this totally private tmpfs: /proc/self/fd/5
  %

Given that it is possible for CAP_SYS_ADMIN users to create executable
binaries without memfd_create(2) and without touching the host
filesystem (not to mention the many other things a CAP_SYS_ADMIN process
would be able to do that would be equivalent or worse), it seems strange
to cause a fair amount of headache to admins when there doesn't appear
to be an actual security benefit to blocking this.

It should be noted that with this change, programs that can do an
unprivileged unshare(CLONE_NEWUSER) would be able to create an
executable memfd even if their current pidns didn't allow it. However,
the same sample program above can also be used in this scenario, meaning
that even with this consideration, blocking CAP_SYS_ADMIN makes little
sense:

  % unshare -rm ./mount-memfd
  this file was executed from this totally private tmpfs: /proc/self/fd/5

This simply further reinforces that locked-down environments need to
disallow CLONE_NEWUSER for unprivileged users (as is already the case in
most container environments).

[1]: https://lore.kernel.org/all/CABi2SkWnAgHK1i6iqSqPMYuNEhtHBkO8jUuCvmG3RmUB5TKHJw@mail.gmail.com/

Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: stable@vger.kernel.org # v6.3+
Fixes: 105ff5339f49 ("mm/memfd: add MFD_NOEXEC_SEAL and MFD_EXEC")
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
---
 kernel/pid_sysctl.h | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/kernel/pid_sysctl.h b/kernel/pid_sysctl.h
index b26e027fc9cd..8a22bc29ebb4 100644
--- a/kernel/pid_sysctl.h
+++ b/kernel/pid_sysctl.h
@@ -24,13 +24,6 @@ static int pid_mfd_noexec_dointvec_minmax(struct ctl_table *table,
 	if (ns != &init_pid_ns)
 		table_copy.data = &ns->memfd_noexec_scope;

-	/*
-	 * set minimum to current value, the effect is only bigger
-	 * value is accepted.
-	 */
-	if (*(int *)table_copy.data > *(int *)table_copy.extra1)
-		table_copy.extra1 = table_copy.data;
-
 	return proc_dointvec_minmax(&table_copy, write, buf, lenp, ppos);
 }

-- 
2.41.0

next prev parent reply	other threads:[~2023-07-13 14:34 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-13 14:33 [RFC PATCH 0/3] memfd: cleanups for vm.memfd_noexec Aleksa Sarai
2023-07-13 14:33 ` [RFC PATCH 1/3] memfd: cleanups for vm.memfd_noexec handling Aleksa Sarai
2023-07-13 14:33 ` Aleksa Sarai [this message]
2023-07-14  0:07   ` [RFC PATCH 2/3] memfd: remove racheting feature from vm.memfd_noexec Aleksa Sarai
2023-07-13 14:33 ` [RFC PATCH 3/3] selftests: memfd: error out test process when child test fails Aleksa Sarai
2023-07-19 23:38   ` Jeff Xu
2023-08-03  2:25     ` Aleksa Sarai
2023-08-03  2:55       ` Jeff Xu
2023-07-18  0:47 ` [RFC PATCH 0/3] memfd: cleanups for vm.memfd_noexec Jeff Xu
2023-07-19  3:10   ` Aleksa Sarai
2023-07-19  7:17     ` Jeff Xu
2023-08-02  1:04       ` Aleksa Sarai
2023-08-02 19:47         ` Jeff Xu
2023-08-02 19:54           ` Jeff Xu
2023-08-02 21:38           ` Aleksa Sarai
2023-08-02 23:11             ` Jeff Xu
2023-08-03  4:39               ` Aleksa Sarai
2023-08-03 14:40                 ` Jeff Xu
2023-08-02 20:45         ` Jeff Xu
2023-08-02 21:48           ` Aleksa Sarai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230713143406.14342-3-cyphar@cyphar.com \
    --to=cyphar@cyphar.com \
    --cc=akpm@linux-foundation.org \
    --cc=asmadeus@codewreck.org \
    --cc=brauner@kernel.org \
    --cc=dverkamp@chromium.org \
    --cc=jeffxu@google.com \
    --cc=keescook@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mcgrof@kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=yuehaibing@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox