From: Jeff Xu <jeffxu@google.com>
To: David Rheinsberg <david@readahead.eu>
Cc: linux-kernel@vger.kernel.org,
Andrew Morton <akpm@linux-foundation.org>,
Kees Cook <keescook@chromium.org>,
Daniel Verkamp <dverkamp@chromium.org>,
linux-mm@kvack.org, Peter Xu <peterx@redhat.com>,
linux-hardening@vger.kernel.org
Subject: Re: [PATCH] memfd: support MFD_NOEXEC alongside MFD_EXEC
Date: Tue, 18 Jul 2023 12:03:57 -0700 [thread overview]
Message-ID: <CALmYWFsjy2jOfKyM3Gd3Ag+p6u5ejDoBp6RhqcXkcAkMiby4SA@mail.gmail.com> (raw)
In-Reply-To: <20230714114753.170814-1-david@readahead.eu>
Hi David
Thanks email and patch for discussion.
On Fri, Jul 14, 2023 at 4:48 AM David Rheinsberg <david@readahead.eu> wrote:
>
> Add a new flag for memfd_create() called MFD_NOEXEC, which has the
> opposite effect as MFD_EXEC (i.e., it strips the executable bits from
> the inode file mode).
>
I previously thought about having the symmetric flags, such as
MFD_NOEXEC/MFD_EXEC/MFD_NOEXEC_SEAL/MFD_EXEC_SEAL, but decided against
it. The app shall decide beforehand what the memfd is created for, if
it is no-executable, then it should be sealed, such that it can't be
chmod to enable "X" bit.
> The default mode for memfd_create() has historically been to use 0777 as
> file modes. The new `MFD_EXEC` flag has now made this explicit, paving
> the way to reduce the default to 0666 and thus dropping the executable
> bits for security reasons. Additionally, the `MFD_NOEXEC_SEAL` flag has
> been added which allows this without changing the default right now.
>
> Unfortunately, `MFD_NOEXEC_SEAL` enforces `MFD_ALLOW_SEALING` and
> `F_SEAL_EXEC`, with no alternatives available. This leads to multiple
> issues:
>
> * Applications that do not want to allow sealing either have to use
> `MFD_EXEC` (which we don't want, unless they actually need the
> executable bits), or they must add `F_SEAL_SEAL` immediately on
> return of memfd_create(2) with `MFD_NOEXEC_SEAL`, since this
> implicitly enables sealing.
>
> Note that we explicitly added `MFD_ALLOW_SEALING` when creating
> memfd_create(2), because enabling seals on existing users of shmem
> without them knowing about it can easily introduce DoS scenarios.
The application that doesn't want MFD_NOEXEC_SEAL can use MFD_EXEC,
the kernel won't add MFD_ALLOW_SEALING implicitly. MFD_EXEC makes the
kernel behave the same as before, this is also why sysctl
vm.memfd_noexec=0 can work seamlessly.
> It
> is unclear why `MFD_NOEXEC_SEAL` was designed to enable seals, and
> this is especially dangerous with `MEMFD_NOEXEC_SCOPE_NOEXEC_SEAL`
> set via sysctl, since it will silently enable sealing on every memfd
> created.
>
Without sealing, chmod(2) can modify the mfd to be executable, that is
the consideration that MFD_NOEXEC is not provided as an option.
Indeed, current design is "biased" towards promoting MFD_NOEXEC_SEAL
as the safest approach, and try to avoid the pitfall that dev
accidently uses "MFD_NOEXEC" without realizing it can still be
chmod().
> * Applications that do not want `MFD_EXEC`, but rely on
> `F_GET_SEALS` to *NOT* return `F_SEAL_EXEC` have no way of achieving
> this other than using `MFD_EXEC` and clearing the executable bits
> manually via fchmod(2). Using `MFD_NOEXEC_SEAL` will set
> `F_SEAL_EXEC` and thus break existing code that hard-codes the
> seal-set.
>
> This is already an issue when sending log-memfds to systemd-journald
> today, which verifies the seal-set of the received FD and fails if
> unknown seals are set. Hence, you have to use `MFD_EXEC` when
> creating memfds for this purpose, even though you really do not need
> the executable bits set.
>
> * Applications that want to enable the executable bit later on,
> currently have no way to create the memfd without it. They have to
> clear the bits immediately after creating it via fchmod(2), or just
> leave them set.
>
Is it OK to do what you want in two steps ? What is the concern there ? i.e.
memfd_create(MFD_EXEC), then chmod to remove the "X" bit.
I imagine this is probably already what the application does for
creating no-executable mfd before my patch, i.e.:
memfd_create(), then chmod() to remove "X" to remove "X" bit.
Thanks!
-Jeff
-Jeff
> By adding MFD_NOEXEC, user-space is now able to clear the executable
> bits on all memfds immediately, clearly stating their intention, without
> requiring fixups after creating the memfd. In particular, it is highly
> useful for existing use-cases that cannot allow new seals to appear on
> memfds.
>
> Signed-off-by: David Rheinsberg <david@readahead.eu>
> ---
> include/linux/pid_namespace.h | 4 ++--
> include/uapi/linux/memfd.h | 1 +
> mm/memfd.c | 29 ++++++++++++++---------------
> 3 files changed, 17 insertions(+), 17 deletions(-)
>
> diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h
> index c758809d5bcf..02f8acf94512 100644
> --- a/include/linux/pid_namespace.h
> +++ b/include/linux/pid_namespace.h
> @@ -19,9 +19,9 @@ struct fs_pin;
> #if defined(CONFIG_SYSCTL) && defined(CONFIG_MEMFD_CREATE)
> /*
> * sysctl for vm.memfd_noexec
> - * 0: memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL
> + * 0: memfd_create() without MFD_EXEC nor MFD_NOEXEC
> * acts like MFD_EXEC was set.
> - * 1: memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL
> + * 1: memfd_create() without MFD_EXEC nor MFD_NOEXEC
> * acts like MFD_NOEXEC_SEAL was set.
> * 2: memfd_create() without MFD_NOEXEC_SEAL will be
> * rejected.
> diff --git a/include/uapi/linux/memfd.h b/include/uapi/linux/memfd.h
> index 273a4e15dfcf..b9e13bd65817 100644
> --- a/include/uapi/linux/memfd.h
> +++ b/include/uapi/linux/memfd.h
> @@ -12,6 +12,7 @@
> #define MFD_NOEXEC_SEAL 0x0008U
> /* executable */
> #define MFD_EXEC 0x0010U
> +#define MFD_NOEXEC 0x0020U
>
> /*
> * Huge page size encoding when MFD_HUGETLB is specified, and a huge page
> diff --git a/mm/memfd.c b/mm/memfd.c
> index e763e76f1106..2afe49368fc5 100644
> --- a/mm/memfd.c
> +++ b/mm/memfd.c
> @@ -266,7 +266,9 @@ long memfd_fcntl(struct file *file, unsigned int cmd, unsigned int arg)
> #define MFD_NAME_PREFIX_LEN (sizeof(MFD_NAME_PREFIX) - 1)
> #define MFD_NAME_MAX_LEN (NAME_MAX - MFD_NAME_PREFIX_LEN)
>
> -#define MFD_ALL_FLAGS (MFD_CLOEXEC | MFD_ALLOW_SEALING | MFD_HUGETLB | MFD_NOEXEC_SEAL | MFD_EXEC)
> +#define MFD_ALL_FLAGS \
> + (MFD_CLOEXEC | MFD_ALLOW_SEALING | MFD_HUGETLB | \
> + MFD_NOEXEC_SEAL | MFD_EXEC | MFD_NOEXEC)
>
> SYSCALL_DEFINE2(memfd_create,
> const char __user *, uname,
> @@ -289,11 +291,13 @@ SYSCALL_DEFINE2(memfd_create,
> return -EINVAL;
> }
>
> - /* Invalid if both EXEC and NOEXEC_SEAL are set.*/
> - if ((flags & MFD_EXEC) && (flags & MFD_NOEXEC_SEAL))
> + if (flags & MFD_NOEXEC_SEAL)
> + flags |= MFD_ALLOW_SEALING | MFD_NOEXEC;
> +
> + if ((flags & MFD_EXEC) && (flags & MFD_NOEXEC))
> return -EINVAL;
>
> - if (!(flags & (MFD_EXEC | MFD_NOEXEC_SEAL))) {
> + if (!(flags & (MFD_EXEC | MFD_NOEXEC))) {
> #ifdef CONFIG_SYSCTL
> int sysctl = MEMFD_NOEXEC_SCOPE_EXEC;
> struct pid_namespace *ns;
> @@ -366,20 +370,15 @@ SYSCALL_DEFINE2(memfd_create,
> file->f_mode |= FMODE_LSEEK | FMODE_PREAD | FMODE_PWRITE;
> file->f_flags |= O_LARGEFILE;
>
> - if (flags & MFD_NOEXEC_SEAL) {
> - struct inode *inode = file_inode(file);
> + if (flags & MFD_NOEXEC)
> + file_inode(file)->i_mode &= ~0111;
>
> - inode->i_mode &= ~0111;
> - file_seals = memfd_file_seals_ptr(file);
> - if (file_seals) {
> + file_seals = memfd_file_seals_ptr(file);
> + if (file_seals) {
> + if (flags & MFD_ALLOW_SEALING)
> *file_seals &= ~F_SEAL_SEAL;
> + if (flags & MFD_NOEXEC_SEAL)
> *file_seals |= F_SEAL_EXEC;
> - }
> - } else if (flags & MFD_ALLOW_SEALING) {
> - /* MFD_EXEC and MFD_ALLOW_SEALING are set */
> - file_seals = memfd_file_seals_ptr(file);
> - if (file_seals)
> - *file_seals &= ~F_SEAL_SEAL;
> }
>
> fd_install(fd, file);
> --
> 2.41.0
>
next parent reply other threads:[~2023-07-18 19:04 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20230714114753.170814-1-david@readahead.eu>
2023-07-18 19:03 ` Jeff Xu [this message]
2023-07-28 7:54 ` David Rheinsberg
2023-08-01 19:24 ` Jeff Xu
2023-08-02 7:56 ` David Rheinsberg
2023-08-03 0:06 ` Jeff Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CALmYWFsjy2jOfKyM3Gd3Ag+p6u5ejDoBp6RhqcXkcAkMiby4SA@mail.gmail.com \
--to=jeffxu@google.com \
--cc=akpm@linux-foundation.org \
--cc=david@readahead.eu \
--cc=dverkamp@chromium.org \
--cc=keescook@chromium.org \
--cc=linux-hardening@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=peterx@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox