From: Topi Miettinen <toiwoton@gmail.com>
To: Catalin Marinas <catalin.marinas@arm.com>,
Kees Cook <keescook@chromium.org>
Cc: "Andrew Morton" <akpm@linux-foundation.org>,
"Christoph Hellwig" <hch@infradead.org>,
"Lennart Poettering" <lennart@poettering.net>,
"Zbigniew Jędrzejewski-Szmek" <zbyszek@in.waw.pl>,
"Will Deacon" <will@kernel.org>,
"Alexander Viro" <viro@zeniv.linux.org.uk>,
"Eric Biederman" <ebiederm@xmission.com>,
"Szabolcs Nagy" <szabolcs.nagy@arm.com>,
"Mark Brown" <broonie@kernel.org>,
"Jeremy Linton" <jeremy.linton@arm.com>,
linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org,
linux-abi-devel@lists.sourceforge.net,
linux-hardening@vger.kernel.org, "Jann Horn" <jannh@google.com>,
"Salvatore Mesoraca" <s.mesoraca16@gmail.com>,
"Igor Zhbanov" <izh1979@gmail.com>
Subject: Re: [PATCH RFC 0/4] mm, arm64: In-kernel support for memory-deny-write-execute (MDWE)
Date: Wed, 20 Apr 2022 22:34:33 +0300 [thread overview]
Message-ID: <c62170c6-5993-2417-4143-5a37a98b227c@gmail.com> (raw)
In-Reply-To: <YmAEDsGtxhim46UI@arm.com>
On 20.4.2022 16.01, Catalin Marinas wrote:
> On Thu, Apr 14, 2022 at 11:52:17AM -0700, Kees Cook wrote:
>> On Wed, Apr 13, 2022 at 02:49:42PM +0100, Catalin Marinas wrote:
>>> The background to this is that systemd has a configuration option called
>>> MemoryDenyWriteExecute [1], implemented as a SECCOMP BPF filter. Its aim
>>> is to prevent a user task from inadvertently creating an executable
>>> mapping that is (or was) writeable. Since such BPF filter is stateless,
>>> it cannot detect mappings that were previously writeable but
>>> subsequently changed to read-only. Therefore the filter simply rejects
>>> any mprotect(PROT_EXEC). The side-effect is that on arm64 with BTI
>>> support (Branch Target Identification), the dynamic loader cannot change
>>> an ELF section from PROT_EXEC to PROT_EXEC|PROT_BTI using mprotect().
>>> For libraries, it can resort to unmapping and re-mapping but for the
>>> main executable it does not have a file descriptor. The original bug
>>> report in the Red Hat bugzilla - [2] - and subsequent glibc workaround
>>> for libraries - [3].
>>
>> Right, so, the systemd filter is a big hammer solution for the kernel
>> not having a very easy way to provide W^X mapping protections to
>> userspace. There's stuff in SELinux, and there have been several
>> attempts[1] at other LSMs to do it too, but nothing stuck.
>>
>> Given the filter, and the implementation of how to enable BTI, I see two
>> solutions:
>>
>> - provide a way to do W^X so systemd can implement the feature differently
>> - provide a way to turn on BTI separate from mprotect to bypass the filter
>>
>> I would agree, the latter seems like the greater hack,
>
> We discussed such hacks in the past but they are just working around the
> fundamental issue - systemd wants W^X but with BPF it can only achieve
> it by preventing mprotect(PROT_EXEC) irrespective of whether the mapping
> was already executable. If we find a better solution for W^X, we
> wouldn't have to hack anything for mprotect(PROT_EXEC|PROT_BTI).
>
>> so I welcome
>> this RFC, though I think it might need to explore a bit of the feature
>> space exposed by other solutions[1] (i.e. see SARA and NAX), otherwise
>> it risks being too narrowly implemented. For example, playing well with
>> JITs should be part of the design, and will likely need some kind of
>> ELF flags and/or "sealing" mode, and to handle the vma alias case as
>> Jann Horn pointed out[2].
>
> I agree we should look at what we want to cover, though trying to avoid
> re-inventing SELinux. With this patchset I went for the minimum that
> systemd MDWE does with BPF.
>
> I think JITs get around it using something like memfd with two separate
> mappings to the same page. We could try to prevent such aliases but
> allow it if an ELF note is detected (or get the JIT to issue a prctl()).
>
> Anyway, with a prctl() we can allow finer-grained control starting with
> anonymous and file mappings and later extending to vma aliases,
> writeable files etc. On top we can add a seal mask so that a process
> cannot disable a control was set. Something like (I'm not good at
> names):
>
> prctl(PR_MDWX_SET, flags, seal_mask);
> prctl(PR_MDWX_GET);
>
> with flags like:
>
> PR_MDWX_MMAP - basics, should cover mmap() and mprotect()
> PR_MDWX_ALIAS - vma aliases, allowed with an ELF note
> PR_MDWX_WRITEABLE_FILE
>
> (needs some more thinking)
>
For systemd, feature compatibility with the BPF version is important so
that we could automatically switch to the kernel version once available
without regressions. So I think PR_MDWX_MMAP (or maybe PR_MDWX_COMPAT)
should match exactly what MemoryDenyWriteExecute=yes as implemented with
BPF has: only forbid mmap(PROT_EXEC|PROT_WRITE) and mprotect(PROT_EXEC).
Like BPF, once installed there should be no way to escape and ELF flags
should be also ignored. ARM BTI should be allowed though (allow
PROT_EXEC|PROT_BTI if the old flags had PROT_EXEC).
Then we could have improved versions (other PR_MDWX_ prctls) with lots
more checks. This could be enabled with MemoryDenyWriteExecute=strict or so.
Perhaps also more relaxed versions (like SARA) could be interesting
(system service running Python with FFI, or perhaps JVM etc), enabled
with for example MemoryDenyWriteExecute=trampolines. That way even those
programs would get some protection (though there would be a gap in the
defences).
-Topi
next prev parent reply other threads:[~2022-04-20 19:34 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-04-13 13:49 Catalin Marinas
2022-04-13 13:49 ` [PATCH RFC 1/4] mm: Track previously writeable vma permission Catalin Marinas
2022-04-13 13:49 ` [PATCH RFC 2/4] mm, personality: Implement memory-deny-write-execute as a personality flag Catalin Marinas
2022-04-21 17:37 ` David Hildenbrand
2022-04-22 10:28 ` Catalin Marinas
2022-04-22 11:04 ` David Hildenbrand
2022-04-22 13:12 ` Catalin Marinas
2022-04-22 17:41 ` David Hildenbrand
2022-04-13 13:49 ` [PATCH RFC 3/4] fs/binfmt_elf: Tell user-space about the DENY_WRITE_EXEC " Catalin Marinas
2022-04-13 13:49 ` [PATCH RFC 4/4] arm64: Select ARCH_ENABLE_DENY_WRITE_EXEC Catalin Marinas
2022-04-13 18:39 ` [PATCH RFC 0/4] mm, arm64: In-kernel support for memory-deny-write-execute (MDWE) Topi Miettinen
2022-04-14 13:49 ` Catalin Marinas
2022-04-14 18:52 ` Kees Cook
2022-04-15 20:01 ` Topi Miettinen
2022-04-20 13:01 ` Catalin Marinas
2022-04-20 17:44 ` Kees Cook
2022-04-20 19:34 ` Topi Miettinen [this message]
2022-04-20 23:21 ` Kees Cook
2022-04-21 15:35 ` Catalin Marinas
2022-04-21 16:42 ` Kees Cook
2022-04-21 17:24 ` Catalin Marinas
2022-04-21 17:41 ` Kees Cook
2022-04-21 18:33 ` Catalin Marinas
2022-04-21 16:48 ` Topi Miettinen
2022-04-21 17:28 ` Catalin Marinas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c62170c6-5993-2417-4143-5a37a98b227c@gmail.com \
--to=toiwoton@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=broonie@kernel.org \
--cc=catalin.marinas@arm.com \
--cc=ebiederm@xmission.com \
--cc=hch@infradead.org \
--cc=izh1979@gmail.com \
--cc=jannh@google.com \
--cc=jeremy.linton@arm.com \
--cc=keescook@chromium.org \
--cc=lennart@poettering.net \
--cc=linux-abi-devel@lists.sourceforge.net \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-hardening@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=s.mesoraca16@gmail.com \
--cc=szabolcs.nagy@arm.com \
--cc=viro@zeniv.linux.org.uk \
--cc=will@kernel.org \
--cc=zbyszek@in.waw.pl \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox