From: Topi Miettinen <toiwoton@gmail.com>
To: Catalin Marinas <catalin.marinas@arm.com>,
David Hildenbrand <david@redhat.com>
Cc: "Joey Gouly" <joey.gouly@arm.com>,
"Andrew Morton" <akpm@linux-foundation.org>,
"Lennart Poettering" <lennart@poettering.net>,
"Zbigniew Jędrzejewski-Szmek" <zbyszek@in.waw.pl>,
"Alexander Viro" <viro@zeniv.linux.org.uk>,
"Kees Cook" <keescook@chromium.org>,
"Szabolcs Nagy" <szabolcs.nagy@arm.com>,
"Mark Brown" <broonie@kernel.org>,
"Jeremy Linton" <jeremy.linton@arm.com>,
linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org,
linux-abi-devel@lists.sourceforge.net, nd@arm.com,
shuah@kernel.org
Subject: Re: [PATCH v2 1/2] mm: Implement memory-deny-write-execute as a prctl
Date: Mon, 23 Jan 2023 19:48:40 +0200 [thread overview]
Message-ID: <0ec67737-3c09-ba5c-f840-9ed02a0ea6bf@gmail.com> (raw)
In-Reply-To: <Y86wA0s/HRVtqLru@arm.com>
On 23.1.2023 18.04, Catalin Marinas wrote:
> On Mon, Jan 23, 2023 at 01:53:46PM +0100, David Hildenbrand wrote:
>> On 23.01.23 13:19, Catalin Marinas wrote:
>>> On Mon, Jan 23, 2023 at 12:45:50PM +0100, David Hildenbrand wrote:
>>>> On 19.01.23 17:03, Joey Gouly wrote:
>>>>> diff --git a/include/linux/mman.h b/include/linux/mman.h
>>>>> index 58b3abd457a3..cee1e4b566d8 100644
>>>>> --- a/include/linux/mman.h
>>>>> +++ b/include/linux/mman.h
>>>>> @@ -156,4 +156,38 @@ calc_vm_flag_bits(unsigned long flags)
>>>>> }
>>>>> unsigned long vm_commit_limit(void);
>>>>> +
>>>>> +/*
>>>>> + * Denies creating a writable executable mapping or gaining executable permissions.
>>>>> + *
>>>>> + * This denies the following:
>>>>> + *
>>>>> + * a) mmap(PROT_WRITE | PROT_EXEC)
>>>>> + *
>>>>> + * b) mmap(PROT_WRITE)
>>>>> + * mprotect(PROT_EXEC)
>>>>> + *
>>>>> + * c) mmap(PROT_WRITE)
>>>>> + * mprotect(PROT_READ)
>>>>> + * mprotect(PROT_EXEC)
>>>>> + *
>>>>> + * But allows the following:
>>>>> + *
>>>>> + * d) mmap(PROT_READ | PROT_EXEC)
>>>>> + * mmap(PROT_READ | PROT_EXEC | PROT_BTI)
>>>>> + */
>>>>
>>>> Shouldn't we clear VM_MAYEXEC at mmap() time such that we cannot set VM_EXEC
>>>> anymore? In an ideal world, there would be no further mprotect changes
>>>> required.
>>>
>>> I don't think it works for this scenario. We don't want to disable
>>> PROT_EXEC entirely, only disallow it if the mapping is not already
>>> executable. The below should be allowed:
>>>
>>> addr = mmap(0, size, PROT_READ | PROT_EXEC, flags, 0, 0);
>>> mprotect(addr, size, PROT_READ | PROT_EXEC | PROT_BTI);
>>>
>>> but IIUC what you meant, it fails if we cleared VM_MAYEXEC at mmap()
>>> time.
>>
>> Yeah, if you allow write access at mmap time, clear VM_MAYEXEC (and disallow
>> VM_EXEC of course).
>
> This should work but it doesn't fully mimic systemd's MDWE behaviour
> (e.g. disallow mprotect(PROT_EXEC) even if the mmap was PROT_READ only).
> Topi wanted to stay close to that at least in the first incarnation of
> this control (can be extended later).
>
>> But I guess we'd have to go one step further: if we allow exec access
>> at mmap time, clear VM_MAYWRITE (and disallow VM_WRITE of course).
>
> Yes, both this and the VM_MAYEXEC clearing if VM_WRITE would be useful
> but as additional controls a process can enable.
>
>> That at least would be then similar to how we handle mmaped files: if the
>> file is not executable, we clear VM_MAYEXEC. If the file is not writable, we
>> clear VM_MAYWRITE.
>
> We still allow VM_MAYWRITE for private mappings, though we do clear
> VM_MAYEXEC if not executable.
>
> It would be nice to use VM_MAY* flags for this logic but we can only
> emulate MDWE if we change the semantics of 'MAY': only check the 'MAY'
> flags for permissions being changed (e.g. allow PROT_EXEC if the vma is
> already VM_EXEC even if !VM_MAYEXEC). Another issue is that we end up
> with some weird combinations like having VM_EXEC without VM_MAYEXEC
> (maybe that's fine).
>
>> Clearing VM_MAYWRITE would imply that also writes via /proc/self/mem to such
>> memory would be forbidden, which might also be what we are trying to
>> achieve, or is that expected to still work?
>
> I think currently with systemd's MDWE it still works (I haven't tried
> though), unless there's something else forcing that file read-only.
>
>> But clearing VM_MAYWRITE would mean that is_cow_mapping() would no
>> longer fire for some VMAs, and we'd have to check if that's fine in
>> all cases.
>
> This will break __access_remote_vm() AFAICT since it can't do a CoW on
> read-only private mapping.
>
>> Having that said, this patch handles the case when the prctl is applied to a
>> process after already having created some writable or executable mappings,
>> to at least forbid if afterwards on these mappings. What is expected to
>> happen if the process already has writable mappings that are executable at
>> the time we enable the prctl?
>
> They are expected to continue to work. The prctl() is meant to be
> invoked by something like systemd so that any subsequent exec() will
> inherit the property.
>
>> Clarifying what the expected semantics with /proc/self/mem are would be
>> nice.
>
> Yeah, this series doesn't handle this. Topi, do you know if systemd does
> anything about /proc/self/mem? To me this option is more about catching
> inadvertent write|exec mappings rather than blocking programs that
> insist on doing this (they can always map a memfd file twice with
> separate write and exec attributes for example).
>
I don't think so. For 100% compatibility with seccomp, the same cases of
mprotect() use should be blocked regardless of the file descriptor used.
There could be more relaxed PR_MDWE_* controls in the future if needed.
Updated systemd PR: https://github.com/systemd/systemd/pull/25276
I wish there were highly granular access controls for /proc, including
/proc/self and /proc/sys/*. Now the best options are to use mount
namespaces and/or SELinux, but they aren't too good for that.
-Topi
next prev parent reply other threads:[~2023-01-23 17:48 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-19 16:03 [PATCH v2 0/2] mm: In-kernel support for memory-deny-write-execute (MDWE) Joey Gouly
2023-01-19 16:03 ` [PATCH v2 1/2] mm: Implement memory-deny-write-execute as a prctl Joey Gouly
2023-01-23 11:45 ` David Hildenbrand
2023-01-23 12:19 ` Catalin Marinas
2023-01-23 12:53 ` David Hildenbrand
2023-01-23 16:04 ` Catalin Marinas
2023-01-23 16:10 ` David Hildenbrand
2023-01-23 16:22 ` Catalin Marinas
2023-01-23 17:48 ` Topi Miettinen [this message]
2023-03-07 13:01 ` Alexey Izbyshev
2023-03-08 12:36 ` Catalin Marinas
2023-01-19 16:03 ` [PATCH v2 2/2] kselftest: vm: add tests for memory-deny-write-execute Joey Gouly
2023-03-01 16:35 ` Peter Xu
2023-03-02 11:07 ` Joey Gouly
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0ec67737-3c09-ba5c-f840-9ed02a0ea6bf@gmail.com \
--to=toiwoton@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=broonie@kernel.org \
--cc=catalin.marinas@arm.com \
--cc=david@redhat.com \
--cc=jeremy.linton@arm.com \
--cc=joey.gouly@arm.com \
--cc=keescook@chromium.org \
--cc=lennart@poettering.net \
--cc=linux-abi-devel@lists.sourceforge.net \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=nd@arm.com \
--cc=shuah@kernel.org \
--cc=szabolcs.nagy@arm.com \
--cc=viro@zeniv.linux.org.uk \
--cc=zbyszek@in.waw.pl \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox