linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Catalin Marinas <catalin.marinas@arm.com>
To: Andrew Murray <andrew.murray@arm.com>
Cc: Andrey Konovalov <andreyknvl@google.com>,
	vincenzo.frascino@arm.com, linux-arm-kernel@lists.infradead.org,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	linux-kselftest@vger.kernel.org,
	Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Subject: Re: [RFC] arm64: Detecting tagged addresses
Date: Wed, 26 Jun 2019 18:45:03 +0100	[thread overview]
Message-ID: <20190626174502.GH29672@arrakis.emea.arm.com> (raw)
In-Reply-To: <20190619121619.GV20984@e119886-lin.cambridge.arm.com>

Hi Andrew,

Cc'ing Luc (sparse maintainer) who's been involved in the past
discussions around static checking of user pointers:

https://lore.kernel.org/linux-arm-kernel/20180905190316.a34yycthgbamx2t3@ltop.local/

So I think the difference here from the previous approach is that we
explicitly mark functions that cannot take tagged addresses (like
find_vma()) and identify the callers.

More comments below:

On Wed, Jun 19, 2019 at 01:16:20PM +0100, Andrew Murray wrote:
> The proposed introduction of a relaxed ARM64 ABI [1] will allow tagged memory
> addresses to be passed through the user-kernel syscall ABI boundary. Tagged
> memory addresses are those which contain a non-zero top byte (the hardware
> has always ignored this top byte due to TCR_EL1.TBI0) and may be useful
> for features such as HWASan.
> 
> To permit this relaxation a proposed patchset [2] strips the top byte (tag)
> from user provided memory addresses prior to use in kernel functions which
> require untagged addresses (for example comparasion/arithmetic of addresses).
> The author of this patchset relied on a variety of techniques [2] (such as
> grep, BUG_ON, sparse etc) to identify as many instances of possible where
> tags need to be stipped.
> 
> To support this effort and to catch future regressions (e.g. in new syscalls
> or ioctls), I've devised an additional approach for detecting the use of
> tagged addresses in functions that do not want them. This approach makes
> use of Smatch [3] and is outlined in this RFC. Due to the ability of Smatch
> to do flow analysis I believe we can annotate the kernel in fewer places
> than a similar approach in sparse.
> 
> I'm keen for feedback on the likely usefulness of this approach.
> 
> We first add some new annotations that are exclusively consumed by Smatch:
> 
> --- a/include/linux/compiler_types.h
> +++ b/include/linux/compiler_types.h
> @@ -19,6 +19,7 @@
>  # define __cond_lock(x,c)      ((c) ? ({ __acquire(x); 1; }) : 0)
>  # define __percpu      __attribute__((noderef, address_space(3)))
>  # define __rcu         __attribute__((noderef, address_space(4)))
> +# define __untagged    __attribute__((address_space(5)))
>  # define __private     __attribute__((noderef))
>  extern void __chk_user_ptr(const volatile void __user *);
>  extern void __chk_io_ptr(const volatile void __iomem *);
[...]
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -2224,7 +2224,7 @@ get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
>  EXPORT_SYMBOL(get_unmapped_area);
>  
>  /* Look up the first VMA which satisfies  addr < vm_end,  NULL if none. */
> -struct vm_area_struct *find_vma(struct mm_struct *mm, unsigned long addr)
> +struct vm_area_struct *find_vma(struct mm_struct *mm, unsigned long __untagged addr)
>  {
>         struct rb_node *rb_node;
>         struct vm_area_struct *vma;
[...]
> This can be further improved - the problem here is that for a given function,
> e.g. find_vma we look for callers where *any* of the parameters
> passed to find_vma are tagged addresses from userspace - i.e. not *just*
> the annotated parameter. This is also true for find_vma's callers' callers'.
> This results in the call tree having false positives.
> 
> It *is* possible to track parameters (e.g. find_vma arg 1 comes from arg 3 of
> do_pages_stat_array etc), but this is limited as if functions modify the
> data then the tracking is stopped (however this can be fixed).
[...]
> An example of a false positve is do_mlock. We untag the address and pass that
> to apply_vma_lock_flags - however we also pass a length - because the length
> came from userspace and could have the top bits set - it's flagged. However
> with improved parameter tracking we can remove this false positive and similar.

Could we track only the conversions from __user * that eventually end up
as __untagged? (I'm not familiar with smatch, so not sure what it can
do). We could assume that an unsigned long argument to a syscall is
default __untagged, unless explicitly marked as __tagged. For example,
sys_munmap() is allowed to take a tagged address.

> Prior to smatch I attempted a similar approach with sparse - however it seemed
> necessary to propogate the __untagged annotation in every function up the call tree,
> and resulted in adding the __untagged annotation to functions that would never
> get near user provided data. This leads to a littering of __untagged all over the
> kernel which doesn't seem appealing.

Indeed. We attempted this last year (see the above thread).

> Smatch is more capable, however it almost
> certainly won't pick up 100% of issues due to the difficulity of making flow
> analysis understand everything a compiler can.
> 
> Is it likely to be acceptable to use the __untagged annotation in user-path
> functions that require untagged addresses across the kernel?

If it helps with identifying missing untagged_addr() calls, I would say
yes (as long as we keep them to a minimum).

> [1] https://lkml.org/lkml/2019/6/13/534
> [2] https://patchwork.kernel.org/cover/10989517/
> [3] http://smatch.sourceforge.net/

-- 
Catalin


  reply	other threads:[~2019-06-26 17:45 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-19 12:16 Andrew Murray
2019-06-26 17:45 ` Catalin Marinas [this message]
2019-06-27 13:18   ` Andrew Murray
2019-06-29  7:17     ` Vincenzo Frascino

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190626174502.GH29672@arrakis.emea.arm.com \
    --to=catalin.marinas@arm.com \
    --cc=andrew.murray@arm.com \
    --cc=andreyknvl@google.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luc.vanoostenryck@gmail.com \
    --cc=vincenzo.frascino@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox