linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Jann Horn <jannh@google.com>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	 Suren Baghdasaryan <surenb@google.com>,
	"Liam R . Howlett" <Liam.Howlett@oracle.com>,
	 Matthew Wilcox <willy@infradead.org>,
	"Paul E . McKenney" <paulmck@kernel.org>,
	 David Hildenbrand <david@redhat.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	 Muchun Song <muchun.song@linux.dev>,
	Richard Henderson <richard.henderson@linaro.org>,
	 Ivan Kokshaysky <ink@jurassic.park.msu.ru>,
	Matt Turner <mattst88@gmail.com>,
	 Thomas Bogendoerfer <tsbogend@alpha.franken.de>,
	 "James E . J . Bottomley"
	<James.Bottomley@hansenpartnership.com>,
	Helge Deller <deller@gmx.de>,  Chris Zankel <chris@zankel.net>,
	Max Filippov <jcmvbkbc@gmail.com>, Arnd Bergmann <arnd@arndb.de>,
	 linux-alpha@vger.kernel.org, linux-mips@vger.kernel.org,
	 linux-parisc@vger.kernel.org, linux-arch@vger.kernel.org,
	 Shuah Khan <shuah@kernel.org>,
	Christian Brauner <brauner@kernel.org>,
	linux-kselftest@vger.kernel.org,
	 Sidhartha Kumar <sidhartha.kumar@oracle.com>,
	Jeff Xu <jeffxu@chromium.org>,
	 Christoph Hellwig <hch@infradead.org>,
	linux-api@vger.kernel.org,  John Hubbard <jhubbard@nvidia.com>
Subject: Re: [PATCH v2 3/5] mm: madvise: implement lightweight guard page mechanism
Date: Tue, 22 Oct 2024 21:08:53 +0200	[thread overview]
Message-ID: <CAG48ez3WS3EH9DuhE1b+7AX3+1=dVtd1M7y_5Ev4Shp2YxiYWg@mail.gmail.com> (raw)
In-Reply-To: <f2448c59-0456-49e8-9676-609629227808@suse.cz>

On Mon, Oct 21, 2024 at 10:46 PM Vlastimil Babka <vbabka@suse.cz> wrote:
> On 10/21/24 22:27, Lorenzo Stoakes wrote:
> > On Mon, Oct 21, 2024 at 10:11:29PM +0200, Vlastimil Babka wrote:
> >> On 10/20/24 18:20, Lorenzo Stoakes wrote:
> >> > +  while (true) {
> >> > +          /* Returns < 0 on error, == 0 if success, > 0 if zap needed. */
> >> > +          err = walk_page_range_mm(vma->vm_mm, start, end,
> >> > +                                   &guard_poison_walk_ops, NULL);
> >> > +          if (err <= 0)
> >> > +                  return err;
> >> > +
> >> > +          /*
> >> > +           * OK some of the range have non-guard pages mapped, zap
> >> > +           * them. This leaves existing guard pages in place.
> >> > +           */
> >> > +          zap_page_range_single(vma, start, end - start, NULL);
> >>
> >> ... however the potentially endless loop doesn't seem great. Could a
> >> malicious program keep refaulting the range (ignoring any segfaults if it
> >> loses a race) with one thread while failing to make progress here with
> >> another thread? Is that ok because it would only punish itself?
> >
> > Sigh. Again, I don't think you've read the previous series have you? Or
> > even the changelog... I added this as Jann asked for it. Originally we'd
> > -EAGAIN if we got raced. See the discussion over in v1 for details.
> >
> > I did it that way specifically to avoid such things, but Jann didn't appear
> > to think it was a problem.
>
> If Jann is fine with this then it must be secure enough.

My thinking there was:

We can legitimately race with adjacent faults populating the area
we're operating on with THP pages; as long as the zapping and
poison-marker-setting are separate, *someone* will have to do the
retry. Either we do it in the kernel, or we tell userspace to handle
it, but having the kernel take care of it is preferable because it
makes the stable UAPI less messy.

One easy way to do it in the kernel would be to return -ERESTARTNOINTR
after the zap_page_range_single() instead of jumping back up, which in
terms of locking and signal handling and such would be equivalent to
looping in userspace (because really that's what -ERESTARTNOINTR does
- it returns out to userspace and moves the instruction pointer back
to restart the syscall). Though if we do that immediately, it might
make MADV_POISON unnecessarily slow, so we should probably retry once
before doing that. The other easy way is to just loop here.

The cond_resched() and pending fatal signal check mean that (except on
CONFIG_PREEMPT_NONE) the only differences between the current
implementation and looping in userspace are that we don't handle
non-fatal signals in between iterations and that we keep hogging the
mmap_lock in read mode. We do already have a bunch of codepaths that
retry on concurrent page table changes, like when zap_pte_range()
encounters a pte_offset_map_lock() failure; though I guess the
difference is that the retry on those is just a couple instructions,
which would be harder to race consistently, while here we redo walks
across the entire range, which should be fairly easy to race
repeatedly.

So I guess you have a point that this might be the easiest way to
stall other tasks that are trying to take mmap_lock for an extended
amount of time, I did not fully consider that... and then I guess you
could use that to slow down usercopy fault handling (once the lock
switches to handoff mode because of a stalled writer?) or slow down
other processes trying to read /proc/$pid/cmdline?

You can already indefinitely hog the mmap_lock with FUSE, though that
requires that you can mount a FUSE filesystem (which you wouldn't be
able in reasonably sandboxed code) and that you can find something
like a pin_user_pages() call that can't drop the mmap lock in between,
and there aren't actually that many of those...

So I guess you have a point and the -ERESTARTNOINTR approach would be
a little bit nicer, as long as it's easy to implement.


  reply	other threads:[~2024-10-22 19:09 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-20 16:20 [PATCH v2 0/5] implement lightweight guard pages Lorenzo Stoakes
2024-10-20 16:20 ` [PATCH v2 1/5] mm: pagewalk: add the ability to install PTEs Lorenzo Stoakes
2024-10-21 13:27   ` Vlastimil Babka
2024-10-21 13:50     ` Lorenzo Stoakes
2024-10-20 16:20 ` [PATCH v2 2/5] mm: add PTE_MARKER_GUARD PTE marker Lorenzo Stoakes
2024-10-21 13:45   ` Vlastimil Babka
2024-10-21 19:57     ` Lorenzo Stoakes
2024-10-21 20:42     ` Lorenzo Stoakes
2024-10-21 21:13       ` Lorenzo Stoakes
2024-10-21 21:20         ` Dave Hansen
2024-10-21 14:13   ` Vlastimil Babka
2024-10-21 14:33     ` Lorenzo Stoakes
2024-10-21 14:54       ` Vlastimil Babka
2024-10-21 15:33         ` Lorenzo Stoakes
2024-10-21 15:41           ` Lorenzo Stoakes
2024-10-21 16:00           ` David Hildenbrand
2024-10-21 16:23             ` Lorenzo Stoakes
2024-10-21 16:44               ` David Hildenbrand
2024-10-21 16:51                 ` Lorenzo Stoakes
2024-10-21 17:00                   ` David Hildenbrand
2024-10-21 17:14                     ` Lorenzo Stoakes
2024-10-21 17:21                       ` David Hildenbrand
2024-10-21 17:26                       ` Vlastimil Babka
2024-10-22 19:13                         ` David Hildenbrand
2024-10-20 16:20 ` [PATCH v2 3/5] mm: madvise: implement lightweight guard page mechanism Lorenzo Stoakes
2024-10-21 17:05   ` David Hildenbrand
2024-10-21 17:15     ` Lorenzo Stoakes
2024-10-21 17:23       ` David Hildenbrand
2024-10-21 19:25         ` John Hubbard
2024-10-21 19:39           ` Lorenzo Stoakes
2024-10-21 20:18             ` David Hildenbrand
2024-10-21 20:11   ` Vlastimil Babka
2024-10-21 20:17     ` David Hildenbrand
2024-10-21 20:25       ` Vlastimil Babka
2024-10-21 20:30         ` Lorenzo Stoakes
2024-10-21 20:37         ` David Hildenbrand
2024-10-21 20:49           ` Lorenzo Stoakes
2024-10-21 21:20             ` David Hildenbrand
2024-10-21 21:33               ` Lorenzo Stoakes
2024-10-21 21:35               ` Vlastimil Babka
2024-10-21 21:46                 ` Lorenzo Stoakes
2024-10-22 19:18                 ` David Hildenbrand
2024-10-21 20:27     ` Lorenzo Stoakes
2024-10-21 20:45       ` Vlastimil Babka
2024-10-22 19:08         ` Jann Horn [this message]
2024-10-22 19:35           ` Lorenzo Stoakes
2024-10-22 19:57             ` Jann Horn
2024-10-22 20:45               ` Lorenzo Stoakes
2024-10-20 16:20 ` [PATCH v2 4/5] tools: testing: update tools UAPI header for mman-common.h Lorenzo Stoakes
2024-10-20 16:20 ` [PATCH v2 5/5] selftests/mm: add self tests for guard page feature Lorenzo Stoakes
2024-10-21 21:31   ` Shuah Khan
2024-10-22 10:25     ` Lorenzo Stoakes
2024-10-20 17:37 ` [PATCH v2 0/5] implement lightweight guard pages Florian Weimer
2024-10-20 19:45   ` Lorenzo Stoakes
2024-10-23  6:24   ` Dmitry Vyukov
2024-10-23  7:19     ` David Hildenbrand
2024-10-23  8:11       ` Lorenzo Stoakes
2024-10-23  8:56         ` Dmitry Vyukov
2024-10-23  9:06           ` Vlastimil Babka
2024-10-23  9:13             ` David Hildenbrand
2024-10-23  9:18               ` Lorenzo Stoakes
2024-10-23  9:29                 ` David Hildenbrand
2024-10-23 11:31                   ` Marco Elver
2024-10-23 11:36                     ` David Hildenbrand
2024-10-23 11:40                       ` Lorenzo Stoakes
2024-10-23  9:17             ` Dmitry Vyukov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAG48ez3WS3EH9DuhE1b+7AX3+1=dVtd1M7y_5Ev4Shp2YxiYWg@mail.gmail.com' \
    --to=jannh@google.com \
    --cc=James.Bottomley@hansenpartnership.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=brauner@kernel.org \
    --cc=chris@zankel.net \
    --cc=david@redhat.com \
    --cc=deller@gmx.de \
    --cc=hch@infradead.org \
    --cc=ink@jurassic.park.msu.ru \
    --cc=jcmvbkbc@gmail.com \
    --cc=jeffxu@chromium.org \
    --cc=jhubbard@nvidia.com \
    --cc=linux-alpha@vger.kernel.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mips@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-parisc@vger.kernel.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mattst88@gmail.com \
    --cc=muchun.song@linux.dev \
    --cc=paulmck@kernel.org \
    --cc=richard.henderson@linaro.org \
    --cc=shuah@kernel.org \
    --cc=sidhartha.kumar@oracle.com \
    --cc=surenb@google.com \
    --cc=tsbogend@alpha.franken.de \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox