Re: [PATCH v2 0/5] mm/memfd: MFD_NOEXEC for memfd_create

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Jeff Xu <jeffxu@chromium.org>
To: Kees Cook <keescook@chromium.org>
Cc: jeffxu@google.com, skhan@linuxfoundation.org,
	akpm@linux-foundation.org,  dmitry.torokhov@gmail.com,
	dverkamp@chromium.org, hughd@google.com,  jorgelo@chromium.org,
	linux-kernel@vger.kernel.org,  linux-kselftest@vger.kernel.org,
	linux-mm@kvack.org, mnissler@chromium.org,  jannh@google.com,
	linux-hardening@vger.kernel.org,
	 Aleksa Sarai <cyphar@cyphar.com>,
	dev@opencontainers.org,  Christian Brauner <brauner@kernel.org>
Subject: Re: [PATCH v2 0/5] mm/memfd: MFD_NOEXEC for memfd_create
Date: Tue, 1 Nov 2022 16:14:39 -0700	[thread overview]
Message-ID: <CABi2SkVXMUVhSTJezfHt_BKxyKP+x++9oveuB3qJZL7N672UKw@mail.gmail.com> (raw)
In-Reply-To: <202208081018.9C782F184C@keescook>

Hi Kees

Sorry for the long overdue reply.

Those questions are really helpful  to understand the usage of memfd_create,
I will try to answer them, please see below inline.

On Mon, Aug 8, 2022 at 10:46 AM Kees Cook <keescook@chromium.org> wrote:
>
> On Fri, Aug 05, 2022 at 10:21:21PM +0000, jeffxu@google.com wrote:
> > This v2 series MFD_NOEXEC, this series includes:
> > 1> address comments in V1
> > 2> add sysctl (vm.mfd_noexec) to change the default file permissions
> >     of memfd_create to be non-executable.
> >
> > Below are cover-level for v1:
> >
> > The default file permissions on a memfd include execute bits, which
> > means that such a memfd can be filled with a executable and passed to
> > the exec() family of functions. This is undesirable on systems where all
> > code is verified and all filesystems are intended to be mounted noexec,
> > since an attacker may be able to use a memfd to load unverified code and
> > execute it.
>
> I would absolutely like to see some kind of protection here. However,
> I'd like a more specific threat model. What are the cases where the X
> bit has been abused (e.g.[1])? What are the cases where the X bit is
> needed (e.g.[2])? With those in mind, it should be possible to draw
> a clear line between the two cases. (e.g. we need to avoid a confused
> deputy attack where an "unprivileged" user can pass an executable memfd
> to a "privileged" user. How those privileges are defined may matter a
> lot based on how memfds are being used. For example, can runc's use of
> executable memfds be distinguished from an attacker's?)
>
runc needs memfd to be executable, so the host with runc need to be able to
create both non-executable memfd and executable memfd.
memfd_create API itself can't enforce the security of how it is being used.

> > Additionally, execution via memfd is a common way to avoid scrutiny for
> > malicious code, since it allows execution of a program without a file
> > ever appearing on disk. This attack vector is not totally mitigated with
> > this new flag, since the default memfd file permissions must remain
> > executable to avoid breaking existing legitimate uses, but it should be
> > possible to use other security mechanisms to prevent memfd_create calls
> > without MFD_NOEXEC on systems where it is known that executable memfds
> > are not necessary.
>
> This reminds me of dealing with non-executable stacks. There ended up
> being three states:
>
> - requested to be executable (PT_GNU_STACK X)
> - requested to be non-executable (PT_GNU_STACK NX)
> - undefined (no PT_GNU_STACK)
>
> The first two are clearly defined, but the third needed a lot of special
> handling. For a "safe by default" world, the third should be "NX", but
> old stuff depended on it being "X".
>
> Here, we have a bit being present or not, so we only have a binary
> state. I'd much rather the default be NX (no bit set) instead of making
> every future (safe) user of memfd have to specify MFD_NOEXEC.
>
> It's also easier on a filtering side to say "disallow memfd_create with
> MFD_EXEC", but how do we deal with the older software?
>
> If the default perms of memfd_create()'s exec bit is controlled by a
> sysctl and the sysctl is set to "leave it executable", how does a user
> create an NX memfd? (i.e. setting MFD_EXEC means "exec" and not setting
> it means "exec" also.) Are two bits needed? Seems wasteful.
> MFD_I_KNOW_HOW_TO_SET_EXEC | MFD_EXEC, etc...
>
Great points,  with those questions and usages in mind, I m thinking below:

1> memfd_create:
Add two flags:
#define MFD_EXEC                      0x0008
#define MFD_NOEXEC _SEAL    0x0010
This lets application to set executable bit explicitly.
(If application set both, it will be rejected)

2> For old application that doesn't set executable bit:
Add a pid name-spaced sysctl.kernel.pid_mfd_noexec, with:
value = 0: Default_EXEC
     Honor MFD_EXEC and MFD_NOEXEC_SEAL
     When none is set, will fall back to original behavior (EXEC)
value = 1: Default_NOEXEC_SEAL
      Honor MFD_EXEC and MFD_NOEXEC_SEAL
      When none is set, will default to MFD_NOEXEC_SEAL

3> Add a pid name-spaced sysctl kernel.pid_mfd_noexec_enforced: with:
value = 0: default, not enforced.
value = 1: enforce NOEXEC_SEAL (overwrite everything)

Then we can use and secure memfd at host and container as below:
At host level:
Case A> In secure by default system where doesn't allow executable memfd:
sysctl.kernel.pid_mfd_noexec_enforced = 1
LSM to block creation of executable memfd  system wide.
This requires a new hook: secure_memfd_create

Case B> In system that need both (runc case),
use sysctl kernel.pid_mfd_noexec = 0/1 during converting application to new API.
SELINUX or landlock to sandbox the process.(requires work).

At container level:
It would be nice for container to control creation of executable memfd too.
This is through  sysctl kernel.pid_mfd_noexec_enforced
This lets runc to create two type of contains:
one with ability to create executable memfd, one without.

The sysctl.kernel.pid_mfd_noexec sets the default value, it is helpful
during  applications are being migrated to set the executable bit.
 Alternatively, we can have a new syscall: memfd_create2, where it is mandatary
to set executable bit (or default to NOEXEC_SEAL),  then
sysctl.kernel.pid_mfd_noexec
is not needed.

> For F_SEAL_EXEC, it seems this should imply F_SEAL_WRITE if forced
> executable to avoid WX mappings (i.e. provide W^X from the start).
>
Yes. I agree.

Thanks!
Best regards,
Jeff Xu


> -Kees
>
> [1] https://bugs.chromium.org/p/chromium/issues/list?q=type%3Dbug-security%20memfd%20escalation&can=1
> [2] https://lwn.net/Articles/781013/
>
> --
> Kees Cook

next prev parent reply	other threads:[~2022-11-01 23:14 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-05 22:21 jeffxu
2022-08-05 22:21 ` [PATCH v2 1/5] mm/memfd: add F_SEAL_EXEC jeffxu
2022-08-05 22:21 ` [PATCH v2 2/5] mm/memfd: add MFD_NOEXEC flag to memfd_create jeffxu
2022-08-05 22:21 ` [PATCH v2 3/5] selftests/memfd: add tests for F_SEAL_EXEC jeffxu
2022-08-05 22:21 ` [PATCH v2 4/5] selftests/memfd: add tests for MFD_NOEXEC jeffxu
2022-08-05 22:21 ` [PATCH v2 5/5] sysctl: add support for mfd_noexec jeffxu
2022-08-13 18:35   ` kernel test robot
2022-08-13 19:06   ` kernel test robot
2022-08-08 17:46 ` [PATCH v2 0/5] mm/memfd: MFD_NOEXEC for memfd_create Kees Cook
2022-11-01 23:14   ` Jeff Xu [this message]
2022-11-02  2:45     ` Kees Cook
2022-11-02 17:18       ` Jeff Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CABi2SkVXMUVhSTJezfHt_BKxyKP+x++9oveuB3qJZL7N672UKw@mail.gmail.com \
    --to=jeffxu@chromium.org \
    --cc=akpm@linux-foundation.org \
    --cc=brauner@kernel.org \
    --cc=cyphar@cyphar.com \
    --cc=dev@opencontainers.org \
    --cc=dmitry.torokhov@gmail.com \
    --cc=dverkamp@chromium.org \
    --cc=hughd@google.com \
    --cc=jannh@google.com \
    --cc=jeffxu@google.com \
    --cc=jorgelo@chromium.org \
    --cc=keescook@chromium.org \
    --cc=linux-hardening@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mnissler@chromium.org \
    --cc=skhan@linuxfoundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox