From: Pedro Falcato <pedro.falcato@gmail.com>
To: Jeff Xu <jeffxu@chromium.org>
Cc: akpm@linux-foundation.org, keescook@chromium.org,
torvalds@linux-foundation.org, usama.anjum@collabora.com,
corbet@lwn.net, Liam.Howlett@oracle.com,
lorenzo.stoakes@oracle.com, jeffxu@google.com,
jorgelo@chromium.org, groeck@chromium.org,
linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org,
linux-mm@kvack.org, jannh@google.com, sroettger@google.com,
linux-hardening@vger.kernel.org, willy@infradead.org,
gregkh@linuxfoundation.org, deraadt@openbsd.org,
surenb@google.com, merimus@google.com, rdunlap@infradead.org,
stable@vger.kernel.org
Subject: Re: [PATCH v1 1/2] mseal: Two fixes for madvise(MADV_DONTNEED) when sealed
Date: Thu, 17 Oct 2024 21:49:24 +0100 [thread overview]
Message-ID: <r5ljdglhtbapgqddtr6gxz5lszvq2yek2rd6bnllxk5i6difzv@imuu3pxh5fcc> (raw)
In-Reply-To: <CABi2SkXwOkoFcUUx=aALWVqurKhns+JKZqm2EyRTbHtROK8SKg@mail.gmail.com>
On Thu, Oct 17, 2024 at 01:34:53PM -0700, Jeff Xu wrote:
> Hi Pedro
>
> On Thu, Oct 17, 2024 at 12:37 PM Pedro Falcato <pedro.falcato@gmail.com> wrote:
> >
> > > For PROT_NONE mappings, the previous blocking of
> > > madvise(MADV_DONTNEED) is unnecessary. As PROT_NONE already prohibits
> > > memory access, madvise(MADV_DONTNEED) should be allowed to proceed in
> > > order to free the page.
> >
> > I don't get it. Is there an actual use case for this?
> >
> Sealing should not over-blocking API that it can allow to pass without
> security concern, this is a case in that principle.
Well, making the interface simple is also important. OpenBSD's mimmutable()
doesn't do any of this and it Just Works(tm)...
>
> There is a user case for this as well: to seal NX stack on android,
> Android uses PROT_NONE/madvise to set up a guide page to prevent stack
> run over boundary. So we need to let madvise to pass.
And you need to MADV_DONTNEED this guard page?
>
> > > For file-backed, private, read-only memory mappings, we previously did
> > > not block the madvise(MADV_DONTNEED). This was based on
> > > the assumption that the memory's content, being file-backed, could be
> > > retrieved from the file if accessed again. However, this assumption
> > > failed to consider scenarios where a mapping is initially created as
> > > read-write, modified, and subsequently changed to read-only. The newly
> > > introduced VM_WASWRITE flag addresses this oversight.
> >
> > We *do not* need this. It's sufficient to just block discard operations on read-only
> > private mappings.
> I think you meant blocking madvise(MADV_DONTNEED) on all read-only
> private file-backed mappings.
>
> I considered that option, but there is a use case for madvise on those
> mappings that never get modified.
>
> Apps can use that to free up RAM. e.g. Considering read-only .text
> section, which never gets modified, madvise( MADV_DONTNEED) can free
> up RAM when memory is in-stress, memory will be reclaimed from a
> backed-file on next read access. Therefore we can't just block all
> read-only private file-backed mapping, only those that really need to,
> such as mapping changed from rw=>r (what you described)
Does anyone actually do this? If so, why? WHYYYY?
The kernel's page reclaim logic should be perfectly cromulent. Please don't do this.
MADV_DONTNEED will also not free any pages if those are shared (rather they'll just be unmapped).
If we really need to do this, I'd maybe suggest walking through page tables, looking for
anon ptes or swap ptes (maybe inside the actual zap code?). But I would really prefer if we
didn't need to do this.
--
Pedro
next prev parent reply other threads:[~2024-10-17 20:49 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-17 0:51 [PATCH v1 0/2] mseal: fixing madvise for file-backed mapping and PROT_NONE jeffxu
2024-10-17 0:51 ` [PATCH v1 1/2] mseal: Two fixes for madvise(MADV_DONTNEED) when sealed jeffxu
2024-10-17 8:32 ` Lorenzo Stoakes
2024-10-17 19:37 ` Pedro Falcato
2024-10-17 20:34 ` Jeff Xu
2024-10-17 20:49 ` Pedro Falcato [this message]
2024-10-17 20:57 ` Jeff Xu
2024-10-22 15:55 ` Vlastimil Babka
2024-10-22 22:54 ` Theo de Raadt
2024-10-23 18:33 ` Jeff Xu
2024-10-20 9:20 ` kernel test robot
2024-10-20 9:20 ` kernel test robot
2024-10-17 0:51 ` [PATCH v1 2/2] selftest/mseal: Add tests for madvise fixes jeffxu
2024-10-17 8:35 ` Lorenzo Stoakes
2024-10-17 8:38 ` [PATCH v1 0/2] mseal: fixing madvise for file-backed mapping and PROT_NONE Lorenzo Stoakes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=r5ljdglhtbapgqddtr6gxz5lszvq2yek2rd6bnllxk5i6difzv@imuu3pxh5fcc \
--to=pedro.falcato@gmail.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=corbet@lwn.net \
--cc=deraadt@openbsd.org \
--cc=gregkh@linuxfoundation.org \
--cc=groeck@chromium.org \
--cc=jannh@google.com \
--cc=jeffxu@chromium.org \
--cc=jeffxu@google.com \
--cc=jorgelo@chromium.org \
--cc=keescook@chromium.org \
--cc=linux-hardening@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=merimus@google.com \
--cc=rdunlap@infradead.org \
--cc=sroettger@google.com \
--cc=stable@vger.kernel.org \
--cc=surenb@google.com \
--cc=torvalds@linux-foundation.org \
--cc=usama.anjum@collabora.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox