linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
To: Pedro Falcato <pfalcato@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	David Hildenbrand <david@redhat.com>,
	"Liam R . Howlett" <Liam.Howlett@oracle.com>,
	Vlastimil Babka <vbabka@suse.cz>, Mike Rapoport <rppt@kernel.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Michal Hocko <mhocko@suse.com>, Jann Horn <jannh@google.com>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, Alexander Viro <viro@zeniv.linux.org.uk>,
	Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
	Matthew Wilcox <willy@infradead.org>
Subject: Re: [PATCH v2 1/3] mm: introduce new .mmap_prepare() file callback
Date: Wed, 14 May 2025 11:14:46 +0100	[thread overview]
Message-ID: <cdfa1a38-2740-4d87-a16e-c1e73e276453@lucifer.local> (raw)
In-Reply-To: <e3olqweuiyxlkewat26n4vyltpn5ordg6ckfndejwjleoemrtw@e2mo6nocaepy>

On Wed, May 14, 2025 at 11:01:49AM +0100, Pedro Falcato wrote:
> On Wed, May 14, 2025 at 10:12:49AM +0100, Lorenzo Stoakes wrote:
> > On Wed, May 14, 2025 at 10:04:06AM +0100, Pedro Falcato wrote:
> > > On Fri, May 09, 2025 at 01:13:34PM +0100, Lorenzo Stoakes wrote:
> > > > Provide a means by which drivers can specify which fields of those
> > > > permitted to be changed should be altered to prior to mmap()'ing a
> > > > range (which may either result from a merge or from mapping an entirely new
> > > > VMA).
> > > >
> > > > Doing so is substantially safer than the existing .mmap() calback which
> > > > provides unrestricted access to the part-constructed VMA and permits
> > > > drivers and file systems to do 'creative' things which makes it hard to
> > > > reason about the state of the VMA after the function returns.
> > > >
> > > > The existing .mmap() callback's freedom has caused a great deal of issues,
> > > > especially in error handling, as unwinding the mmap() state has proven to
> > > > be non-trivial and caused significant issues in the past, for instance
> > > > those addressed in commit 5de195060b2e ("mm: resolve faulty mmap_region()
> > > > error path behaviour").
> > > >
> > > > It also necessitates a second attempt at merge once the .mmap() callback
> > > > has completed, which has caused issues in the past, is awkward, adds
> > > > overhead and is difficult to reason about.
> > > >
> > > > The .mmap_prepare() callback eliminates this requirement, as we can update
> > > > fields prior to even attempting the first merge. It is safer, as we heavily
> > > > restrict what can actually be modified, and being invoked very early in the
> > > > mmap() process, error handling can be performed safely with very little
> > > > unwinding of state required.
> > > >
> > > > The .mmap_prepare() and deprecated .mmap() callbacks are mutually
> > > > exclusive, so we permit only one to be invoked at a time.
> > > >
> > > > Update vma userland test stubs to account for changes.
> > > >
> > > > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> > >
> > > Reviewed-by: Pedro Falcato <pfalcato@suse.de>
> > >
> > > Neat idea, thanks. This should also help out with the insane proliferation of
> > > vm_flags_set in ->mmap() callbacks all over. Hopefully.
> [snip]
> > >
> > > 2) Possibly add a ->mmap_finish()? With a fully constructed vma at that point.
> > >    So things like remap_pfn_range can still be used by drivers' mmap()
> > >    implementation.
> >
> > Thanks for raising the remap_pfn_range() case! Yes this is definitely a
> > thing.
> >
> > However this proposed callback would totally undermine the purpose of the
> > change - the idea is to never give a vma because if we do so we lose all of
> > the advantages here and may as well just leave the mmap in place for
> > this...
> >
>
> Yes, good point.
>
> > However I do think we'll need a new callback at some point (previously
> > discussed in thread).
> >
> > We could perhaps provide the option to _explicitly_ remap for instance. I
> > would want it to be heavily locked down as to what can happen and to happen
> > as early as possible.
> >
>
> I think we can simply combine various ideas here. Like:
>
> struct vm_area_desc_private {
> 	struct vm_area_desc desc;
> 	struct vm_area_struct *vma;
> };
>
> Then, for this "mmap_finish" callback and associated infra:
>
> 	int (*mmap_finish)(struct vm_area_desc *desc);
>
> int mmap_remap_pfn_range(struct vm_area_desc *desc, /*...*/)
> {
> 	struct vm_area_desc_private *private = container_of(desc, struct vm_area_desc_private, desc);
> 	return remap_pfn_range(private->vma, /*...*/);
> }
>
> int random_driver_mmap_finish(struct vm_area_desc *desc)
> {
> 	return mmap_remap_pfn_range(desc, desc->start, some_pfn, some_size,
> 				    desc->page_prot);
> }
>
>

Yeah that's really nice actually! I like that. Will note down for when we
come to this.

Obviously we need to know whent the finish implies a remap, but we can have
a field for this potentially set in the vm_area_desc to indicate this.

> I think something of the sort would be quite less prone to abuse, and we could
> take the time to then even polish up the interface (e.g maybe it would be nicer
> if mmap_remap_pfn_range took a vma offset, and not a start address).

Right yeah, David definitely wanted some improvements in this area so this
aligns with other intentions/work.

>
> Anyway, just brainstorming. This idea came to mind, I think it's quite interesting.

Nice ideas :) yeah, good chance to undo some sins here.

>
> > This is something we can iterate on, as trying to find the ideal scheme
> > immediately will just lead to inaction, the big advantage with the approach
> > here is we can be iterative.
> >
> > We provide this, use it in a scenario which allows us to eliminate merge
> > retry, and can take it from there :)
>
> Yep, totally.
>
> >
> > So indeed, watch this space basically... I will be highly proactive on this
> > stuff moving forward.
> >
> > >
> > > 1) is particularly important so our VFS and driver friends know this is supposed
> > > to be The Way Forward.
> >
> > I think probably the answer is for me to fully update the document to be
> > bang up to date, right? But that would obviously work best as a follow up
> > patch :)
> >
>
> You love your big projects :p I had the impression the docs were more or less up to date?
> The VFS people do update it somewhat diligently. And for mm we only have ->mmap, ->get_unmapped_area,
> and now ->mmap_prepare. And the descriptions are ATM quite useless, just
> "called by the mmap(2) system call".

Much like leg day, sometimes painful things are worthwhile things :)

As for docs I saw there were some missing uring entries, and it explicitly
says 'as of kernel 4.18'.

So I feel that it really needs to be properly updated and it ends up being
a little out of scope to the series.

Since this series only uses this callback in one, mm-specific, place, I
think it makes sense to attack this on a follow up, alongside updating file
system callbacks etc.

But it's definitely on the todo! :)

Cheers, Lorenzo

>
> --
> Pedro


  reply	other threads:[~2025-05-14 10:15 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-09 12:13 [PATCH v2 0/3] eliminate mmap() retry merge, add .mmap_prepare hook Lorenzo Stoakes
2025-05-09 12:13 ` [PATCH v2 1/3] mm: introduce new .mmap_prepare() file callback Lorenzo Stoakes
2025-05-12  9:24   ` Christian Brauner
2025-05-12 11:31     ` Lorenzo Stoakes
2025-05-13  7:29       ` Christian Brauner
2025-05-12 13:37   ` Vlastimil Babka
2025-05-13  9:01   ` David Hildenbrand
2025-05-13  9:32     ` Lorenzo Stoakes
2025-05-13 13:25       ` David Hildenbrand
2025-05-13 15:31         ` Suren Baghdasaryan
2025-05-13 13:22   ` Liam R. Howlett
2025-05-14  9:04   ` Pedro Falcato
2025-05-14  9:12     ` Lorenzo Stoakes
2025-05-14 10:01       ` Pedro Falcato
2025-05-14 10:14         ` Lorenzo Stoakes [this message]
2025-05-09 12:13 ` [PATCH v2 2/3] mm: secretmem: convert to .mmap_prepare() hook Lorenzo Stoakes
2025-05-12 13:41   ` Vlastimil Babka
2025-05-13 13:23   ` Liam R. Howlett
2025-05-13 15:32     ` Suren Baghdasaryan
2025-05-09 12:13 ` [PATCH v2 3/3] mm/vma: remove mmap() retry merge Lorenzo Stoakes
2025-05-12 13:44   ` Vlastimil Babka
2025-05-13 13:25   ` Liam R. Howlett
2025-05-13 15:38     ` Suren Baghdasaryan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cdfa1a38-2740-4d87-a16e-c1e73e276453@lucifer.local \
    --to=lorenzo.stoakes@oracle.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=brauner@kernel.org \
    --cc=david@redhat.com \
    --cc=jack@suse.cz \
    --cc=jannh@google.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=pfalcato@suse.de \
    --cc=rppt@kernel.org \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox