linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Dmitry Vyukov <dvyukov@google.com>
To: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: David Hildenbrand <david@redhat.com>,
	fw@deneb.enyo.de,  James.Bottomley@hansenpartnership.com,
	Liam.Howlett@oracle.com,  akpm@linux-foundation.org,
	arnd@arndb.de, brauner@kernel.org,  chris@zankel.net,
	deller@gmx.de, hch@infradead.org, ink@jurassic.park.msu.ru,
	 jannh@google.com, jcmvbkbc@gmail.com, jeffxu@chromium.org,
	 jhubbard@nvidia.com, linux-alpha@vger.kernel.org,
	linux-api@vger.kernel.org,  linux-arch@vger.kernel.org,
	linux-kernel@vger.kernel.org,  linux-kselftest@vger.kernel.org,
	linux-mips@vger.kernel.org,  linux-mm@kvack.org,
	linux-parisc@vger.kernel.org, mattst88@gmail.com,
	 muchun.song@linux.dev, paulmck@kernel.org,
	richard.henderson@linaro.org,  shuah@kernel.org,
	sidhartha.kumar@oracle.com, surenb@google.com,
	 tsbogend@alpha.franken.de, vbabka@suse.cz, willy@infradead.org,
	 elver@google.com, Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH v2 0/5] implement lightweight guard pages
Date: Wed, 23 Oct 2024 10:56:33 +0200	[thread overview]
Message-ID: <CACT4Y+ZE9Zco7KaQoT50aooXCHxhz2N_psTAFtT+ZrH14Si7aw@mail.gmail.com> (raw)
In-Reply-To: <5a3d3bc8-60db-46d0-b689-9aeabcdb8eab@lucifer.local>

On Wed, 23 Oct 2024 at 10:12, Lorenzo Stoakes
<lorenzo.stoakes@oracle.com> wrote:
>
> +cc Linus as reference a commit of his below...
>
> On Wed, Oct 23, 2024 at 09:19:03AM +0200, David Hildenbrand wrote:
> > On 23.10.24 08:24, Dmitry Vyukov wrote:
> > > Hi Florian, Lorenzo,
> > >
> > > This looks great!
>
> Thanks!
>
> > >
> > > What I am VERY interested in is if poisoned pages cause SIGSEGV even when
> > > the access happens in the kernel. Namely, the syscall still returns EFAULT,
> > > but also SIGSEGV is queued on return to user-space.
>
> Yeah we don't in any way.
>
> I think adding something like this would be a bit of its own project.

I can totally understand this.

> The fault andler for this is in handle_pte_marker() in mm/memory.c, where
> we do the following:
>
>         /* Hitting a guard page is always a fatal condition. */
>         if (marker & PTE_MARKER_GUARD)
>                 return VM_FAULT_SIGSEGV;
>
> So basically we pass this back to whoever invoked the fault. For uaccess we
> end up in arch-specific code that eventually checks exception tables
> etc. and for x86-64 that's kernelmode_fixup_or_oops().
>
> There used to be a sig_on_uaccess_err in the x86-specific thread_struct
> that let you propagate it but Linus pulled it out in commit 02b670c1f88e
> ("x86/mm: Remove broken vsyscall emulation code from the page fault code")
> where it was presumably used for vsyscall.
>
> Of course we could just get something much higher up the stack to send the
> signal, but we'd need to be careful we weren't breaking anything doing
> it...

Can setting TIF_NOTIFY_RESUME and then doing the rest when returning
to userspace help here?

> I address GUP below.
>
> > >
> > > Catching bad accesses in system calls is currently the weak spot for
> > > all user-space bug detection tools (GWP-ASan, libefence, libefency, etc).
> > > It's almost possible with userfaultfd, but catching faults in the kernel
> > > requires admin capability, so not really an option for generic bug
> > > detection tools (+inconvinience of userfaultfd setup/handler).
> > > Intercepting all EFAULT from syscalls is not generally possible
> > > (w/o ptrace, usually not an option as well), and EFAULT does not always
> > > mean a bug.
> > >
> > > Triggering SIGSEGV even in syscalls would be not just a performance
> > > optimization, but a new useful capability that would allow it to catch
> > > more bugs.
> >
> > Right, we discussed that offline also as a possible extension to the
> > userfaultfd SIGBUS mode.
> >
> > I did not look into that yet, but I was wonder if there could be cases where
> > a different process could trigger that SIGSEGV, and how to (and if to)
> > handle that.
> >
> > For example, ptrace (access_remote_vm()) -> GUP likely can trigger that. I
> > think with userfaultfd() we will currently return -EFAULT, because we call
> > get_user_page_vma_remote() that is not prepared for dropping the mmap lock.
> > Possibly that is the right thing to do, but not sure :)

That's a good corner case.
I guess also process_vm_readv/writev.
Not triggering the signal in these cases looks like the right thing to do.

> > These "remote" faults set FOLL_REMOTE -> FAULT_FLAG_REMOTE, so we might be
> > able to distinguish them and perform different handling.
>
> So all GUP will return -EFAULT when hitting guard pages unless we change
> something.
>
> In GUP we handle this in faultin_page():
>
>         if (ret & VM_FAULT_ERROR) {
>                 int err = vm_fault_to_errno(ret, flags);
>
>                 if (err)
>                         return err;
>                 BUG();
>         }
>
> And vm_fault_to_errno() is:
>
> static inline int vm_fault_to_errno(vm_fault_t vm_fault, int foll_flags)
> {
>         if (vm_fault & VM_FAULT_OOM)
>                 return -ENOMEM;
>         if (vm_fault & (VM_FAULT_HWPOISON | VM_FAULT_HWPOISON_LARGE))
>                 return (foll_flags & FOLL_HWPOISON) ? -EHWPOISON : -EFAULT;
>         if (vm_fault & (VM_FAULT_SIGBUS | VM_FAULT_SIGSEGV))
>                 return -EFAULT;
>         return 0;
> }
>
> Again, I think if we wanted special handling here we'd need to probably
> propagate that fault from higher up, but yes we'd need to for one
> definitely not do so if it's remote but I worry about other cases.
>
> >
> > --
> > Cheers,
> >
> > David / dhildenb
> >
>
> Overall while I sympathise with this, it feels dangerous and a pretty major
> change, because there'll be something somewhere that will break because it
> expects faults to be swallowed that we no longer do swallow.
>
> So I'd say it'd be something we should defer, but of course it's a highly
> user-facing change so how easy that would be I don't know.
>
> But I definitely don't think a 'introduce the ability to do cheap PROT_NONE
> guards' series is the place to also fundmentally change how user access
> page faults are handled within the kernel :)

Will delivering signals on kernel access be a backwards compatible
change? Or will we need a different API? MADV_GUARD_POISON_KERNEL?
It's just somewhat painful to detect/update all userspace if we add
this feature in future. Can we say signal delivery on kernel accesses
is unspecified?


  reply	other threads:[~2024-10-23  8:56 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-20 16:20 Lorenzo Stoakes
2024-10-20 16:20 ` [PATCH v2 1/5] mm: pagewalk: add the ability to install PTEs Lorenzo Stoakes
2024-10-21 13:27   ` Vlastimil Babka
2024-10-21 13:50     ` Lorenzo Stoakes
2024-10-20 16:20 ` [PATCH v2 2/5] mm: add PTE_MARKER_GUARD PTE marker Lorenzo Stoakes
2024-10-21 13:45   ` Vlastimil Babka
2024-10-21 19:57     ` Lorenzo Stoakes
2024-10-21 20:42     ` Lorenzo Stoakes
2024-10-21 21:13       ` Lorenzo Stoakes
2024-10-21 21:20         ` Dave Hansen
2024-10-21 14:13   ` Vlastimil Babka
2024-10-21 14:33     ` Lorenzo Stoakes
2024-10-21 14:54       ` Vlastimil Babka
2024-10-21 15:33         ` Lorenzo Stoakes
2024-10-21 15:41           ` Lorenzo Stoakes
2024-10-21 16:00           ` David Hildenbrand
2024-10-21 16:23             ` Lorenzo Stoakes
2024-10-21 16:44               ` David Hildenbrand
2024-10-21 16:51                 ` Lorenzo Stoakes
2024-10-21 17:00                   ` David Hildenbrand
2024-10-21 17:14                     ` Lorenzo Stoakes
2024-10-21 17:21                       ` David Hildenbrand
2024-10-21 17:26                       ` Vlastimil Babka
2024-10-22 19:13                         ` David Hildenbrand
2024-10-20 16:20 ` [PATCH v2 3/5] mm: madvise: implement lightweight guard page mechanism Lorenzo Stoakes
2024-10-21 17:05   ` David Hildenbrand
2024-10-21 17:15     ` Lorenzo Stoakes
2024-10-21 17:23       ` David Hildenbrand
2024-10-21 19:25         ` John Hubbard
2024-10-21 19:39           ` Lorenzo Stoakes
2024-10-21 20:18             ` David Hildenbrand
2024-10-21 20:11   ` Vlastimil Babka
2024-10-21 20:17     ` David Hildenbrand
2024-10-21 20:25       ` Vlastimil Babka
2024-10-21 20:30         ` Lorenzo Stoakes
2024-10-21 20:37         ` David Hildenbrand
2024-10-21 20:49           ` Lorenzo Stoakes
2024-10-21 21:20             ` David Hildenbrand
2024-10-21 21:33               ` Lorenzo Stoakes
2024-10-21 21:35               ` Vlastimil Babka
2024-10-21 21:46                 ` Lorenzo Stoakes
2024-10-22 19:18                 ` David Hildenbrand
2024-10-21 20:27     ` Lorenzo Stoakes
2024-10-21 20:45       ` Vlastimil Babka
2024-10-22 19:08         ` Jann Horn
2024-10-22 19:35           ` Lorenzo Stoakes
2024-10-22 19:57             ` Jann Horn
2024-10-22 20:45               ` Lorenzo Stoakes
2024-10-20 16:20 ` [PATCH v2 4/5] tools: testing: update tools UAPI header for mman-common.h Lorenzo Stoakes
2024-10-20 16:20 ` [PATCH v2 5/5] selftests/mm: add self tests for guard page feature Lorenzo Stoakes
2024-10-21 21:31   ` Shuah Khan
2024-10-22 10:25     ` Lorenzo Stoakes
2024-10-20 17:37 ` [PATCH v2 0/5] implement lightweight guard pages Florian Weimer
2024-10-20 19:45   ` Lorenzo Stoakes
2024-10-23  6:24   ` Dmitry Vyukov
2024-10-23  7:19     ` David Hildenbrand
2024-10-23  8:11       ` Lorenzo Stoakes
2024-10-23  8:56         ` Dmitry Vyukov [this message]
2024-10-23  9:06           ` Vlastimil Babka
2024-10-23  9:13             ` David Hildenbrand
2024-10-23  9:18               ` Lorenzo Stoakes
2024-10-23  9:29                 ` David Hildenbrand
2024-10-23 11:31                   ` Marco Elver
2024-10-23 11:36                     ` David Hildenbrand
2024-10-23 11:40                       ` Lorenzo Stoakes
2024-10-23  9:17             ` Dmitry Vyukov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CACT4Y+ZE9Zco7KaQoT50aooXCHxhz2N_psTAFtT+ZrH14Si7aw@mail.gmail.com \
    --to=dvyukov@google.com \
    --cc=James.Bottomley@hansenpartnership.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=brauner@kernel.org \
    --cc=chris@zankel.net \
    --cc=david@redhat.com \
    --cc=deller@gmx.de \
    --cc=elver@google.com \
    --cc=fw@deneb.enyo.de \
    --cc=hch@infradead.org \
    --cc=ink@jurassic.park.msu.ru \
    --cc=jannh@google.com \
    --cc=jcmvbkbc@gmail.com \
    --cc=jeffxu@chromium.org \
    --cc=jhubbard@nvidia.com \
    --cc=linux-alpha@vger.kernel.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mips@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-parisc@vger.kernel.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mattst88@gmail.com \
    --cc=muchun.song@linux.dev \
    --cc=paulmck@kernel.org \
    --cc=richard.henderson@linaro.org \
    --cc=shuah@kernel.org \
    --cc=sidhartha.kumar@oracle.com \
    --cc=surenb@google.com \
    --cc=torvalds@linux-foundation.org \
    --cc=tsbogend@alpha.franken.de \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox