From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
To: Pedro Falcato <pfalcato@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>,
David Hildenbrand <david@redhat.com>,
"Liam R . Howlett" <Liam.Howlett@oracle.com>,
Vlastimil Babka <vbabka@suse.cz>, Mike Rapoport <rppt@kernel.org>,
Suren Baghdasaryan <surenb@google.com>,
Michal Hocko <mhocko@suse.com>, Jann Horn <jannh@google.com>,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-mm@kvack.org, Alexander Viro <viro@zeniv.linux.org.uk>,
Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
Matthew Wilcox <willy@infradead.org>
Subject: Re: [PATCH v2 1/3] mm: introduce new .mmap_prepare() file callback
Date: Wed, 14 May 2025 10:12:49 +0100 [thread overview]
Message-ID: <e18fea49-388d-40d2-9b55-b9f91ac3ce11@lucifer.local> (raw)
In-Reply-To: <dqir4mv7twugxj6nstqziympxc6z3k5act4cwhgpg2naeqy3sx@wkn4wvnwbpih>
On Wed, May 14, 2025 at 10:04:06AM +0100, Pedro Falcato wrote:
> On Fri, May 09, 2025 at 01:13:34PM +0100, Lorenzo Stoakes wrote:
> > Provide a means by which drivers can specify which fields of those
> > permitted to be changed should be altered to prior to mmap()'ing a
> > range (which may either result from a merge or from mapping an entirely new
> > VMA).
> >
> > Doing so is substantially safer than the existing .mmap() calback which
> > provides unrestricted access to the part-constructed VMA and permits
> > drivers and file systems to do 'creative' things which makes it hard to
> > reason about the state of the VMA after the function returns.
> >
> > The existing .mmap() callback's freedom has caused a great deal of issues,
> > especially in error handling, as unwinding the mmap() state has proven to
> > be non-trivial and caused significant issues in the past, for instance
> > those addressed in commit 5de195060b2e ("mm: resolve faulty mmap_region()
> > error path behaviour").
> >
> > It also necessitates a second attempt at merge once the .mmap() callback
> > has completed, which has caused issues in the past, is awkward, adds
> > overhead and is difficult to reason about.
> >
> > The .mmap_prepare() callback eliminates this requirement, as we can update
> > fields prior to even attempting the first merge. It is safer, as we heavily
> > restrict what can actually be modified, and being invoked very early in the
> > mmap() process, error handling can be performed safely with very little
> > unwinding of state required.
> >
> > The .mmap_prepare() and deprecated .mmap() callbacks are mutually
> > exclusive, so we permit only one to be invoked at a time.
> >
> > Update vma userland test stubs to account for changes.
> >
> > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>
> Reviewed-by: Pedro Falcato <pfalcato@suse.de>
>
> Neat idea, thanks. This should also help out with the insane proliferation of
> vm_flags_set in ->mmap() callbacks all over. Hopefully.
Thanks! :)
>
> However, could we:
>
> 1) Add a small writeup to Documentation/filesystems/vfs.rst for this callback
I looked at this but the problem is the docs there are already hopelessly
out of date (see the kernel version referred to).
But I'll look into how best to do the docs going forward.
>
> 2) Possibly add a ->mmap_finish()? With a fully constructed vma at that point.
> So things like remap_pfn_range can still be used by drivers' mmap()
> implementation.
Thanks for raising the remap_pfn_range() case! Yes this is definitely a
thing.
However this proposed callback would totally undermine the purpose of the
change - the idea is to never give a vma because if we do so we lose all of
the advantages here and may as well just leave the mmap in place for
this...
However I do think we'll need a new callback at some point (previously
discussed in thread).
We could perhaps provide the option to _explicitly_ remap for instance. I
would want it to be heavily locked down as to what can happen and to happen
as early as possible.
This is something we can iterate on, as trying to find the ideal scheme
immediately will just lead to inaction, the big advantage with the approach
here is we can be iterative.
We provide this, use it in a scenario which allows us to eliminate merge
retry, and can take it from there :)
So indeed, watch this space basically... I will be highly proactive on this
stuff moving forward.
>
> 1) is particularly important so our VFS and driver friends know this is supposed
> to be The Way Forward.
I think probably the answer is for me to fully update the document to be
bang up to date, right? But that would obviously work best as a follow up
patch :)
Have added to todo, so will follow up on this thanks!
>
> --
> Pedro
Cheers, Lorenzo
next prev parent reply other threads:[~2025-05-14 9:13 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-09 12:13 [PATCH v2 0/3] eliminate mmap() retry merge, add .mmap_prepare hook Lorenzo Stoakes
2025-05-09 12:13 ` [PATCH v2 1/3] mm: introduce new .mmap_prepare() file callback Lorenzo Stoakes
2025-05-12 9:24 ` Christian Brauner
2025-05-12 11:31 ` Lorenzo Stoakes
2025-05-13 7:29 ` Christian Brauner
2025-05-12 13:37 ` Vlastimil Babka
2025-05-13 9:01 ` David Hildenbrand
2025-05-13 9:32 ` Lorenzo Stoakes
2025-05-13 13:25 ` David Hildenbrand
2025-05-13 15:31 ` Suren Baghdasaryan
2025-05-13 13:22 ` Liam R. Howlett
2025-05-14 9:04 ` Pedro Falcato
2025-05-14 9:12 ` Lorenzo Stoakes [this message]
2025-05-14 10:01 ` Pedro Falcato
2025-05-14 10:14 ` Lorenzo Stoakes
2025-05-09 12:13 ` [PATCH v2 2/3] mm: secretmem: convert to .mmap_prepare() hook Lorenzo Stoakes
2025-05-12 13:41 ` Vlastimil Babka
2025-05-13 13:23 ` Liam R. Howlett
2025-05-13 15:32 ` Suren Baghdasaryan
2025-05-09 12:13 ` [PATCH v2 3/3] mm/vma: remove mmap() retry merge Lorenzo Stoakes
2025-05-12 13:44 ` Vlastimil Babka
2025-05-13 13:25 ` Liam R. Howlett
2025-05-13 15:38 ` Suren Baghdasaryan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e18fea49-388d-40d2-9b55-b9f91ac3ce11@lucifer.local \
--to=lorenzo.stoakes@oracle.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=brauner@kernel.org \
--cc=david@redhat.com \
--cc=jack@suse.cz \
--cc=jannh@google.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=pfalcato@suse.de \
--cc=rppt@kernel.org \
--cc=surenb@google.com \
--cc=vbabka@suse.cz \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox