From: Kalesh Singh <kaleshsingh@google.com>
To: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: David Hildenbrand <david@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
Suren Baghdasaryan <surenb@google.com>,
"Liam R . Howlett" <Liam.Howlett@oracle.com>,
Matthew Wilcox <willy@infradead.org>,
Vlastimil Babka <vbabka@suse.cz>,
"Paul E . McKenney" <paulmck@kernel.org>,
Jann Horn <jannh@google.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Shuah Khan <shuah@kernel.org>,
linux-kselftest@vger.kernel.org, linux-api@vger.kernel.org,
John Hubbard <jhubbard@nvidia.com>,
Juan Yescas <jyescas@google.com>
Subject: Re: [PATCH 0/4] mm: permit guard regions for file-backed/shmem mappings
Date: Wed, 19 Feb 2025 12:56:31 -0800 [thread overview]
Message-ID: <CAC_TJvfBvZZc=xyB0jez2VCDit-rettfQf7H4xhQbN7bYxKw-A@mail.gmail.com> (raw)
In-Reply-To: <e07dfd31-197c-49d0-92bd-12aad02daa7e@lucifer.local>
On Wed, Feb 19, 2025 at 11:20 AM Lorenzo Stoakes
<lorenzo.stoakes@oracle.com> wrote:
>
> On Wed, Feb 19, 2025 at 10:52:04AM -0800, Kalesh Singh wrote:
> > On Wed, Feb 19, 2025 at 1:17 AM Lorenzo Stoakes
> > <lorenzo.stoakes@oracle.com> wrote:
> > >
> > > On Wed, Feb 19, 2025 at 10:15:47AM +0100, David Hildenbrand wrote:
> > > > On 19.02.25 10:03, Lorenzo Stoakes wrote:
> > > > > On Wed, Feb 19, 2025 at 12:25:51AM -0800, Kalesh Singh wrote:
> > > > > > On Thu, Feb 13, 2025 at 10:18 AM Lorenzo Stoakes
> > > > > > <lorenzo.stoakes@oracle.com> wrote:
> > > > > > >
> > > > > > > The guard regions feature was initially implemented to support anonymous
> > > > > > > mappings only, excluding shmem.
> > > > > > >
> > > > > > > This was done such as to introduce the feature carefully and incrementally
> > > > > > > and to be conservative when considering the various caveats and corner
> > > > > > > cases that are applicable to file-backed mappings but not to anonymous
> > > > > > > ones.
> > > > > > >
> > > > > > > Now this feature has landed in 6.13, it is time to revisit this and to
> > > > > > > extend this functionality to file-backed and shmem mappings.
> > > > > > >
> > > > > > > In order to make this maximally useful, and since one may map file-backed
> > > > > > > mappings read-only (for instance ELF images), we also remove the
> > > > > > > restriction on read-only mappings and permit the establishment of guard
> > > > > > > regions in any non-hugetlb, non-mlock()'d mapping.
> > > > > >
> > > > > > Hi Lorenzo,
> > > > > >
> > > > > > Thank you for your work on this.
> > > > >
> > > > > You're welcome.
> > > > >
> > > > > >
> > > > > > Have we thought about how guard regions are represented in /proc/*/[s]maps?
> > > > >
> > > > > This is off-topic here but... Yes, extensively. No they do not appear
> > > > > there.
> > > > >
> > > > > I thought you had attended LPC and my talk where I mentioned this
> > > > > purposefully as a drawback?
> > > > >
> > > > > I went out of my way to advertise this limitation at the LPC talk, in the
> > > > > original series, etc. so it's a little disappointing that this is being
> > > > > brought up so late, but nobody else has raised objections to this issue so
> > > > > I think in general it's not a limitation that matters in practice.
> > > > >
> >
> > Sorry for raising this now, yes at the time I believe we discussed
> > reducing the vma slab memory usage for the PROT_NONE mappings. I
> > didn't imagine that apps could have dependencies on the mapped ELF
> > ranges in /proc/self/[s]maps until recent breakages from a similar
> > feature. Android itself doesn't depend on this but what I've seen is
> > banking apps and apps that have obfuscation to prevent reverse
> > engineering (the particulars of such obfuscation are a black box).
>
> Ack ok fair enough, sorry, but obviously you can understand it's
> frustrating when I went to great lengths to advertise this not only at the
> talk but in the original series.
>
> Really important to have these discussions early. Not that really we can do
> much about this, as inherently this feature cannot give you what you need.
>
> Is it _only_ banking apps that do this? And do they exclusively read
> /proc/$pid/maps? I mean there's nothing we can do about that, sorry.
Not only banking apps but that's a common category.
> If that's immutable, then unless you do your own very, very, very slow custom
> android maps implementation (that will absolutely break the /proc/$pid/maps
> scalability efforts atm) this is just a no-go.
>
Yeah unfortunately that's immutable as app versions are mostly
independent from the OS version.
We do have something that handles this by encoding the guard regions
in the vm_flags, but as you can imagine it's not generic enough for
upstream.
> >
> > > > > >
> > > > > > In the field, I've found that many applications read the ranges from
> > > > > > /proc/self/[s]maps to determine what they can access (usually related
> > > > > > to obfuscation techniques). If they don't know of the guard regions it
> > > > > > would cause them to crash; I think that we'll need similar entries to
> > > > > > PROT_NONE (---p) for these, and generally to maintain consistency
> > > > > > between the behavior and what is being said from /proc/*/[s]maps.
> > > > >
> > > > > No, we cannot have these, sorry.
> > > > >
> > > > > Firstly /proc/$pid/[s]maps describes VMAs. The entire purpose of this
> > > > > feature is to avoid having to accumulate VMAs for regions which are not
> > > > > intended to be accessible.
> > > > >
> > > > > Secondly, there is no practical means for this to be accomplished in
> > > > > /proc/$pid/maps in _any_ way - as no metadata relating to a VMA indicates
> > > > > they have guard regions.
> > > > >
> > > > > This is intentional, because setting such metadata is simply not practical
> > > > > - why? Because when you try to split the VMA, how do you know which bit
> > > > > gets the metadata and which doesn't? You can't without _reading page
> > > > > tables_.
> >
> > Yeah the splitting becomes complicated with any vm flags for this...
> > meaning any attempt to expose this in /proc/*/maps have to
> > unconditionally walk the page tables :(
>
> It's not really complicated, it's _impossible_ unless you made literally
> all VMA code walk page tables for every single operation. Which we are
> emphatically not going to do :)
>
> And no, /proc/$pid/maps is _never_ going to walk page tables. For obvious
> performance reasons.
>
> >
> > > > >
> > > > > /proc/$pid/smaps _does_ read page tables, but we can't start pretending
> > > > > VMAs exist when they don't, this would be completely inaccurate, would
> > > > > break assumptions for things like mremap (which require a single VMA) and
> > > > > would be unworkable.
> > > > >
> > > > > The best that _could_ be achieved is to have a marker in /proc/$pid/smaps
> > > > > saying 'hey this region has guard regions somewhere'.
> > > >
> > > > And then simply expose it in /proc/$pid/pagemap, which is a better interface
> > > > for this pte-level information inside of VMAs. We should still have a spare
> > > > bit for that purpose in the pagemap entries.
> > >
> > > Ah yeah thanks David forgot about that!
> > >
> > > This is also a possibility if that'd solve your problems Kalesh?
> >
> > I'm not sure what is the correct interface to advertise these. Maybe
> > smaps as you suggested since we already walk the page tables there?
> > and pagemap bit for the exact pages as well? It won't solve this
> > particular issue, as 1000s of in field apps do look at this through
> > /proc/*/maps. But maybe we have to live with that...
>
> I mean why are we even considering this if you can't change this anywhere?
> Confused by that.
>
> I'm afraid upstream can't radically change interfaces to suit this
> scenario.
>
> We also can't change smaps in the way you want, it _has_ to still give
> output per VMA information.
Sorry I wasn't suggesting to change the entries in smaps, rather
agreeing to your marker suggestion. Maybe a set of ranges for each
smaps entry that has guards? It doesn't solve the use case, but does
make these regions visible to userspace.
>
> The proposed change that would be there would be a flag or something
> indicating that the VMA has guard regions _SOMEWHERE_ in it.
>
> Since this doesn't solve your problem, adds complexity, and nobody else
> seems to need it, I would suggest this is not worthwhile and I'd rather not
> do this.
>
> Therefore for your needs there are literally only two choices here:
>
> 1. Add a bit to /proc/$pid/pagemap OR
> 2. a new interface.
>
> I am not in favour of a new interface here, if we can just extend pagemap.
>
> What you'd have to do is:
>
> 1. Find virtual ranges via /proc/$pid/maps
> 2. iterate through /proc/$pid/pagemaps to retrieve state for all ranges.
>
Could we also consider an smaps field like:
VmGuards: [AAA, BBB), [CCC, DDD), ...
or something of that sort?
> Since anything that would retrieve guard region state would need to walk
> page tables, any approach would be slow and I don't think this would be any
> less slow than any other interface.
>
> This way you'd be able to find all guard regions all the time.
>
> This is just the trade-off for this feature unfortunately - its whole
> design ethos is to allow modification of -faulting- behaviour without
> having to modify -VMA- behaviour.
>
> But if it's banking apps whose code you can't control (surprised you don't
> lock down these interfaces), I mean is this even useful to you?
>
> If your requirement is 'you have to change /proc/$pid/maps to show guard
> regions' I mean the answer is that we can't.
>
> >
> > We can argue that such apps are broken since they may trip on the
> > SIGBUS off the end of the file -- usually this isn't the case for the
> > ELF segment mappings.
>
> Or tearing of the maps interface, or things getting unmapped or or
> or... It's really not a sane thing to do.
>
> >
> > This is still useful for other cases, I just wanted to get some ideas
> > if this can be extended to further use cases.
>
> Well I'm glad that you guys find it useful for _something_ ;)
>
> Again this wasn't written only for you (it is broadly a good feature for
> upstream), but I did have your use case in mind, so I'm a little
> disappointed that it doesn't help, as I like to solve problems.
>
> But I'm glad it solves at least some for you...
I recall Liam had a proposal to store the guard ranges in the maple tree?
I wonder if that can be used in combination with this approach to have
a better representation of this?
>
> >
> > Thanks,
> > Kalesh
> >
> >
> > >
> > > This bit will be fought over haha
> > >
> > > >
> > > > --
> > > > Cheers,
> > > >
> > > > David / dhildenb
> > > >
next prev parent reply other threads:[~2025-02-19 20:56 UTC|newest]
Thread overview: 64+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-13 18:16 Lorenzo Stoakes
2025-02-13 18:17 ` [PATCH 1/4] mm: allow guard regions in file-backed and read-only mappings Lorenzo Stoakes
2025-02-18 14:15 ` Vlastimil Babka
2025-02-18 16:01 ` David Hildenbrand
2025-02-18 16:12 ` Lorenzo Stoakes
2025-02-18 16:17 ` David Hildenbrand
2025-02-18 16:21 ` Lorenzo Stoakes
2025-02-18 16:27 ` David Hildenbrand
2025-02-18 16:49 ` Lorenzo Stoakes
2025-02-18 17:00 ` David Hildenbrand
2025-02-18 17:04 ` Lorenzo Stoakes
2025-02-24 14:02 ` Lorenzo Stoakes
2025-02-13 18:17 ` [PATCH 2/4] selftests/mm: rename guard-pages to guard-regions Lorenzo Stoakes
2025-02-18 14:15 ` Vlastimil Babka
2025-03-02 8:35 ` Lorenzo Stoakes
2025-02-13 18:17 ` [PATCH 3/4] tools/selftests: expand all guard region tests to file-backed Lorenzo Stoakes
2025-02-18 14:17 ` Vlastimil Babka
2025-04-22 10:37 ` Ryan Roberts
2025-04-22 10:47 ` Lorenzo Stoakes
2025-04-22 11:03 ` Ryan Roberts
2025-04-22 11:07 ` Lorenzo Stoakes
2025-04-22 11:11 ` Ryan Roberts
2025-02-13 18:17 ` [PATCH 4/4] tools/selftests: add file/shmem-backed mapping guard region tests Lorenzo Stoakes
2025-02-18 14:18 ` Vlastimil Babka
2025-02-18 12:12 ` [PATCH 0/4] mm: permit guard regions for file-backed/shmem mappings Vlastimil Babka
2025-02-18 13:05 ` Lorenzo Stoakes
2025-02-18 14:35 ` David Hildenbrand
2025-02-18 14:53 ` Lorenzo Stoakes
2025-02-18 15:20 ` David Hildenbrand
2025-02-18 16:43 ` Lorenzo Stoakes
2025-02-18 17:14 ` David Hildenbrand
2025-02-18 17:20 ` Lorenzo Stoakes
2025-02-18 17:25 ` David Hildenbrand
2025-02-18 17:28 ` Lorenzo Stoakes
2025-02-18 17:31 ` David Hildenbrand
2025-02-25 15:54 ` Vlastimil Babka
2025-02-25 16:31 ` David Hildenbrand
2025-02-25 16:37 ` Lorenzo Stoakes
2025-02-25 16:48 ` David Hildenbrand
2025-02-19 8:25 ` Kalesh Singh
2025-02-19 8:35 ` Kalesh Singh
2025-02-19 9:15 ` Lorenzo Stoakes
2025-02-19 17:32 ` Liam R. Howlett
2025-02-19 9:03 ` Lorenzo Stoakes
2025-02-19 9:15 ` David Hildenbrand
2025-02-19 9:17 ` Lorenzo Stoakes
2025-02-19 18:52 ` Kalesh Singh
2025-02-19 19:20 ` Lorenzo Stoakes
2025-02-19 20:56 ` Kalesh Singh [this message]
2025-02-20 8:51 ` Lorenzo Stoakes
2025-02-20 8:57 ` David Hildenbrand
2025-02-20 9:04 ` Lorenzo Stoakes
2025-02-20 9:23 ` David Hildenbrand
2025-02-20 9:47 ` Lorenzo Stoakes
2025-02-20 10:03 ` David Hildenbrand
2025-02-20 10:15 ` Lorenzo Stoakes
2025-02-20 12:44 ` David Hildenbrand
2025-02-20 13:18 ` Lorenzo Stoakes
2025-02-20 16:21 ` Suren Baghdasaryan
2025-02-20 18:08 ` Kalesh Singh
2025-02-21 11:04 ` Lorenzo Stoakes
2025-02-21 17:24 ` Kalesh Singh
2025-02-20 9:22 ` Vlastimil Babka
2025-02-20 9:53 ` Lorenzo Stoakes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAC_TJvfBvZZc=xyB0jez2VCDit-rettfQf7H4xhQbN7bYxKw-A@mail.gmail.com' \
--to=kaleshsingh@google.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=david@redhat.com \
--cc=jannh@google.com \
--cc=jhubbard@nvidia.com \
--cc=jyescas@google.com \
--cc=linux-api@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=paulmck@kernel.org \
--cc=shuah@kernel.org \
--cc=surenb@google.com \
--cc=vbabka@suse.cz \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox