linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Jeff Xu <jeffxu@chromium.org>
To: Pedro Falcato <pedro.falcato@gmail.com>
Cc: Jeff Xu <jeffxu@google.com>, Matthew Wilcox <willy@infradead.org>,
	akpm@linux-foundation.org,  keescook@chromium.org,
	sroettger@google.com, jorgelo@chromium.org,  groeck@chromium.org,
	linux-kernel@vger.kernel.org,  linux-kselftest@vger.kernel.org,
	linux-mm@kvack.org, jannh@google.com,  surenb@google.com,
	alex.sierra@amd.com, apopple@nvidia.com,
	 aneesh.kumar@linux.ibm.com, axelrasmussen@google.com,
	ben@decadent.org.uk,  catalin.marinas@arm.com, david@redhat.com,
	dwmw@amazon.co.uk,  ying.huang@intel.com, hughd@google.com,
	joey.gouly@arm.com, corbet@lwn.net,  wangkefeng.wang@huawei.com,
	Liam.Howlett@oracle.com,  torvalds@linux-foundation.org,
	lstoakes@gmail.com, mawupeng1@huawei.com,  linmiaohe@huawei.com,
	namit@vmware.com, peterx@redhat.com,  peterz@infradead.org,
	ryan.roberts@arm.com, shr@devkernel.io, vbabka@suse.cz,
	 xiujianfeng@huawei.com, yu.ma@intel.com,
	zhangpeng362@huawei.com,  dave.hansen@intel.com, luto@kernel.org,
	linux-hardening@vger.kernel.org
Subject: Re: [RFC PATCH v1 0/8] Introduce mseal() syscall
Date: Mon, 23 Oct 2023 10:42:59 -0700	[thread overview]
Message-ID: <CABi2SkWNGEw=f63WDYZ1WNqkGsYUK-Hws9WwC0ZoeJQsRQ6GRg@mail.gmail.com> (raw)
In-Reply-To: <CAKbZUD12pEaDCLysOpT3yL3064=P28Pm3c=UBqhOZYeBP026WA@mail.gmail.com>

On Thu, Oct 19, 2023 at 3:47 PM Pedro Falcato <pedro.falcato@gmail.com> wrote:
>
> On Thu, Oct 19, 2023 at 6:30 PM Jeff Xu <jeffxu@google.com> wrote:
> >
> > Hi Pedro
> >
> > Some followup on mmap() + mprotect():
> >
> > On Wed, Oct 18, 2023 at 11:20 AM Jeff Xu <jeffxu@google.com> wrote:
> > >
> > > On Tue, Oct 17, 2023 at 3:35 PM Pedro Falcato <pedro.falcato@gmail.com> wrote:
> > > >
> > > > > >
> > > > > > I think it's worth pointing out that this suggestion (with PROT_*)
> > > > > > could easily integrate with mmap() and as such allow for one-shot
> > > > > > mmap() + mseal().
> > > > > > If we consider the common case as 'addr = mmap(...); mseal(addr);', it
> > > > > > definitely sounds like a performance win as we halve the number of
> > > > > > syscalls for a sealed mapping. And if we trivially look at e.g OpenBSD
> > > > > > ld.so code, mmap() + mimmutable() and mprotect() + mimmutable() seem
> > > > > > like common patterns.
> > > > > >
> > > > > Yes. mmap() can support sealing as well, and memory is allocated as
> > > > > immutable from begining.
> > > > > This is orthogonal to mseal() though.
> > > >
> > > > I don't see how this can be orthogonal to mseal().
> > > > In the case we opt for adding PROT_ bits, we should more or less only
> > > > need to adapt calc_vm_prot_bits(), and the rest should work without
> > > > issues.
> > > > vma merging won't merge vmas with different prots. The current
> > > > interfaces (mmap and mprotect) would work just fine.
> > > > In this case, mseal() or mimmutable() would only be needed if you need
> > > > to set immutability over a range of VMAs with different permissions.
> > > >
> > > Agreed. By orthogonal, I meant we can have two APIs:
> > > mmap() and mseal()/mprotect()
> > > i.e. we can't just rely on mmap() only without mseal()/mprotect()/mimmutable().
> > > Sealing can be applied after initial memory creation.
> > >
> > > > Note: modifications should look kinda like this: https://godbolt.org/z/Tbjjd14Pe
> > > > The only annoying wrench in my plans here is that we have effectively
> > > > run out of vm_flags bits in 32-bit architectures, so this approach as
> > > > I described is not compatible with 32-bit.
> > > >
> > > > > In case of ld.so, iiuc, memory can be first allocated as W, then later
> > > > > changed to RO, for example, during symbol resolution.
> > > > > The important point is that the application can decide what type of
> > > > > sealing it wants, and when to apply it.  There needs to be an api(),
> > > > > that can be mseal() or mprotect2() or mimmutable(), the naming is not
> > > > > important to me.
> > > > >
> > > > > mprotect() in linux have the following signature:
> > > > > int mprotect(void addr[.len], size_t len, int prot);
> > > > > the prot bitmasks are all taken here.
> > > > > I have not checked the prot field in mmap(), there might be bits left,
> > > > > even not, we could have mmap2(), so that is not an issue.
> > > >
> > > > I don't see what you mean. We have plenty of prot bits left (32-bits,
> > > > and we seem to have around 8 different bits used).
> > > > And even if we didn't, prot is the same in mprotect and mmap and mmap2 :)
> > > >
> > > > The only issue seems to be that 32-bit ran out of vm_flags, but that
> > > > can probably be worked around if need be.
> > > >
> > > Ah, you are right about this. vm_flags is full, and prot in mprotect() is not.
> > > Apology that I was wrong previously and caused confusion.
> > >
> > > There is a slight difference in the syntax of mprotect and mseal.
> > > Each time when mprotect() is called, the kernel takes all of RWX bits
> > > and updates vm_flags,
> > > In other words, the application sets/unset each RWX, and kernel takes it.
> > >
> > > In the mseal() case, the kernel will remember which seal types were
> > > applied previously, and the application doesn’t need to repeat all
> > > existing seal types in the next mseal().  Once a seal type is applied,
> > > it can’t be unsealed.
> > >
> > > So if we want to use mprotect() for sealing, developers need to think
> > > of sealing bits differently than the rest of prot bits. It is a
> > > different programming model, might or might not be an obvious concept
> > > to developers.
> > >
> > This probably doesn't matter much to developers.
> > We can enforce the sealing bit to be the same as the rest of PROT bits.
> > If mprotect() tries to unset sealing, it will fail.
>
> Yep. Erroneous or malicious mprotects would all be caught. However, if
> we add a PROT_DOWNGRADEABLE (that could let you, lets say, mprotect()
> to less permissions or even downright munmap()) you'd want some care
> to preserve that bit when setting permissions.
>
> >
> > > There is a difference in input check and error handling as well.
> > > for mseal(), if a given address range has a gap (unallocated memory),
> > > or if one of VMA is sealed with MM_SEAL_SEAL flag, none of VMAs is
> > > updated.
> > > For mprotect(), some VMAs can be updated, till an error happens to a VMA.
> > >
> > This difference doesn't matter much.
> >
> > For mprotect()/mmap(), is Linux implementation limited by POSIX ?
>
> No. POSIX works merely as a baseline that UNIX systems aim towards.
> You can (and very frequently do) extend POSIX interfaces (in fact,
> it's how most of POSIX was written, through sheer
> "design-by-committee" on a bunch of UNIX systems' extensions).
>
> > This can be made backward compatible.
> > If there is no objection to adding linux specific values in mmap() and
> > mprotect(),
> > This works for me.
>
> Linux already has system-specific values for PROT_ (PROT_BTI,
> PROT_MTE, PROT_GROWSUP, PROT_GROWSDOWN, etc).
> Whether this is the right interface is another question. I do like it
> a lot, but there's of course value in being compatible with existing
> solutions (like mimmutable()).
>

Thanks Pedro for providing examples on mm extension to POSIX. This
opens more design options on solving the sealing problem. I will take
a few days to research  design options.

-Jeff


> --
> Pedro


  parent reply	other threads:[~2023-10-23 17:43 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-16 14:38 jeffxu
2023-10-16 14:38 ` [RFC PATCH v1 1/8] Add mseal syscall jeffxu
2023-10-16 15:05   ` Greg KH
2023-10-17  6:50     ` Jeff Xu
2023-10-16 14:38 ` [RFC PATCH v1 2/8] Wire up " jeffxu
2023-10-16 14:38 ` [RFC PATCH v1 3/8] mseal: add can_modify_mm and can_modify_vma jeffxu
2023-10-16 14:38 ` [RFC PATCH v1 4/8] mseal: seal mprotect jeffxu
2023-10-16 14:38 ` [RFC PATCH v1 5/8] mseal munmap jeffxu
2023-10-16 14:38 ` [RFC PATCH v1 6/8] mseal mremap jeffxu
2023-10-16 14:38 ` [RFC PATCH v1 7/8] mseal mmap jeffxu
2023-10-16 14:38 ` [RFC PATCH v1 8/8] selftest mm/mseal mprotect/munmap/mremap/mmap jeffxu
2023-10-16 15:18 ` [RFC PATCH v1 0/8] Introduce mseal() syscall Matthew Wilcox
2023-10-17  8:34   ` Jeff Xu
2023-10-17 12:56     ` Matthew Wilcox
2023-10-17 15:29   ` Pedro Falcato
2023-10-17 21:33     ` Jeff Xu
2023-10-17 22:35       ` Pedro Falcato
2023-10-18 18:20         ` Jeff Xu
2023-10-19 17:30           ` Jeff Xu
2023-10-19 22:47             ` Pedro Falcato
2023-10-19 23:06               ` Linus Torvalds
2023-10-23 17:44                 ` Jeff Xu
2023-10-23 17:42               ` Jeff Xu [this message]
2023-10-16 17:23 ` Linus Torvalds
2023-10-17  9:07   ` Jeff Xu
2023-10-17 17:22     ` Linus Torvalds
2023-10-17 18:20       ` Theo de Raadt
2023-10-17 18:38         ` Linus Torvalds
2023-10-17 18:55           ` Theo de Raadt
2023-10-19  8:00           ` Stephen Röttger
2023-10-20 16:27             ` Theo de Raadt
2023-10-24 10:42               ` Stephen Röttger
2023-10-17 23:01         ` Jeff Xu
2023-10-17 23:56           ` Theo de Raadt
2023-10-18  3:18             ` Jeff Xu
2023-10-18  3:37               ` Theo de Raadt
2023-10-18 15:17               ` Matthew Wilcox
2023-10-18 18:54                 ` Jeff Xu
2023-10-18 20:36                   ` Theo de Raadt
2023-10-19  8:28                     ` Stephen Röttger
2023-10-20 15:55                       ` Theo de Raadt
2023-10-16 17:34 ` Jann Horn
2023-10-17  8:42   ` Jeff Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CABi2SkWNGEw=f63WDYZ1WNqkGsYUK-Hws9WwC0ZoeJQsRQ6GRg@mail.gmail.com' \
    --to=jeffxu@chromium.org \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.sierra@amd.com \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=apopple@nvidia.com \
    --cc=axelrasmussen@google.com \
    --cc=ben@decadent.org.uk \
    --cc=catalin.marinas@arm.com \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@intel.com \
    --cc=david@redhat.com \
    --cc=dwmw@amazon.co.uk \
    --cc=groeck@chromium.org \
    --cc=hughd@google.com \
    --cc=jannh@google.com \
    --cc=jeffxu@google.com \
    --cc=joey.gouly@arm.com \
    --cc=jorgelo@chromium.org \
    --cc=keescook@chromium.org \
    --cc=linmiaohe@huawei.com \
    --cc=linux-hardening@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lstoakes@gmail.com \
    --cc=luto@kernel.org \
    --cc=mawupeng1@huawei.com \
    --cc=namit@vmware.com \
    --cc=pedro.falcato@gmail.com \
    --cc=peterx@redhat.com \
    --cc=peterz@infradead.org \
    --cc=ryan.roberts@arm.com \
    --cc=shr@devkernel.io \
    --cc=sroettger@google.com \
    --cc=surenb@google.com \
    --cc=torvalds@linux-foundation.org \
    --cc=vbabka@suse.cz \
    --cc=wangkefeng.wang@huawei.com \
    --cc=willy@infradead.org \
    --cc=xiujianfeng@huawei.com \
    --cc=ying.huang@intel.com \
    --cc=yu.ma@intel.com \
    --cc=zhangpeng362@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox