From: "Kasireddy, Vivek" <vivek.kasireddy@intel.com>
To: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
Andrew Morton <akpm@linux-foundation.org>,
"Liam R. Howlett" <Liam.Howlett@oracle.com>,
Vlastimil Babka <vbabka@suse.cz>, Jann Horn <jannh@google.com>,
Pedro Falcato <pfalcato@suse.de>,
David Hildenbrand <david@kernel.org>,
Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
Subject: RE: [PATCH] mm/mremap: allow VMAs with VM_DONTEXPAND|VM_PFNMAP when creating new mapping
Date: Fri, 21 Nov 2025 06:51:31 +0000 [thread overview]
Message-ID: <IA0PR11MB71851184AA5977BBAFBD55C7F8D5A@IA0PR11MB7185.namprd11.prod.outlook.com> (raw)
In-Reply-To: <976e9916-c949-4fa0-b92e-87f6841b5cbe@lucifer.local>
Hi Lorenzo,
> Subject: Re: [PATCH] mm/mremap: allow VMAs with
> VM_DONTEXPAND|VM_PFNMAP when creating new mapping
>
> Hi Vivek, thanks for the patch.
>
> In general though, let's please not make a fundamental change to mremap()
> behaviour in late -rc6. Late in cycle/during merge window we're really only
> interested in existing series, series that are less involved than this.
I am sorry about the timing of this patch. My intended goal was to just start
a discussion to address this issue we were seeing in Qemu.
>
> On Wed, Nov 19, 2025 at 09:35:46PM -0800, Vivek Kasireddy wrote:
> > When mremap is used to create a new mapping, we should not return
> > -EFAULT for VMAs with VM_DONTEXPAND or VM_PFNMAP flags set
> because
> > the old VMA would neither be expanded nor shrunk in this case. This
>
> I guess you're trying to be succinct here and 'clone' each input VMA using
> the 0 source size input.
Yes, cloning would be the right term to describe what we are doing here.
>
> However this can't work.
>
> This operation is not equivalent to an mmap(). It may seem to be for
> ordinary mappings but in practice it isn't:
>
> (syscall)
> -> do_mremap()
> -> mremap_at()
> -> expand_vma()
> -> move_vma()
> -> copy_vma_and_data()
> -> copy_vma()
>
> Essentially copying the properties of the VMA to the new region.
>
> But this doesn't work for PFN map.
>
> At _no point_ are you invoking the original f_op->mmap or
> f_op->mmap_prepare handler.
If all the properties of the original VMA are going to be copied over to
the new VMA, wouldn't that mean whatever state, page tables were
setup via f_op->mmap or f_op->mmap_prepare handlers for the original
VMA, would also be copied over to the new VMA right?
>
> And these handles for PFN maps set up page tables, because PFN maps
> literally do not exist as VMAs which have properties independent of their
> page tables like this.
>
> Equally VM_DONTEXPAND implies the same kind of semantics.
>
> > is particularly useful when trying to create a new VMA using other
> > existing VMAs that have these flags set, such as the ones associated
> > with VFIO devices.
> >
> > Specifically, there are use-cases where a VMM such as Qemu would
> > want to map a non-contiguous buffer associated with a VFIO device
> > in the following way:
> >
> > void *start, *cur;
> > int i;
> >
> > start = mmap(NULL, size, PROT_NONE, MAP_SHARED, -1, 0);
> > if (start == MAP_FAILED) {
> > return start;
> > }
>
> Err, MAP_SHARED and -1 fd? Doesn't this always fail? It does locally with
> -EBADFD?
Sorry, I was trying different things and did not copy the version that had
the MAP_ANONYMOUS flag to the commit message. In other words, I had this
in my test code, which works:
start = mmap(NULL, size, PROT_NONE, MAP_SHARED | MAP_ANONYMOUS, -1, 0);
if (start == MAP_FAILED) {
return start;
}
>
> >
> > cur = start;
> > for (i = 0; i < iov_cnt; i++) {
> > if (mremap(iov[i].iov_base, 0, iov[i].iov_len,
> > MREMAP_FIXED | MREMAP_MAYMOVE, cur) == MAP_FAILED) {
>
> Trying to cleverly clone a VMA I guess.
>
> > goto err;
> > }
> > cur += iov[i].iov_len;
> > }
> > return start;
> >
> > The above code currently works when mapping buffers backed by
> > shmem (memfd) but fails with -EFAULT when mapping VFIO backed
>
> Right, which is expected, as described above, you _are_ expanding here.
If the original VMA would remain unaltered and all we are doing is
cloning it to a new address, would that still be considered expansion
of the original VMA?
>
> > buffers because the VMAs associated with iov[i].iov_base addresses
> > have VM_DONTEXPAND and VM_PFNMAP flags set. Therefore, fix this
> > issue by not returning -EFAULT when a new mapping is being created.
>
> We're not fixing an issue, we're (unfortunately, incorrectly) introducing
> an entirely new behaviour of permitting the remapping of page tables
> belonging to VMAs posssesing VM_DONTEXPAND and VM_PFNMAP flags.
I mentioned fixing because with this patch applied mremap no longer fails
with VFIO backed buffers and my use-case works without any further issues.
However, I do understand that just because it works doesn't mean anything
and that there may be other pitfalls that I did not run into, but I am wondering
what those are?
>
> >
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
> > Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> > Cc: Vlastimil Babka <vbabka@suse.cz>
> > Cc: Jann Horn <jannh@google.com>
> > Cc: Pedro Falcato <pfalcato@suse.de>
> > Cc: David Hildenbrand <david@kernel.org>
> > Cc: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
> > Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
> > ---
> > mm/mremap.c | 3 ++-
> > 1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/mm/mremap.c b/mm/mremap.c
> > index fdb0485ede74..d3868d941f72 100644
> > --- a/mm/mremap.c
> > +++ b/mm/mremap.c
> > @@ -1736,7 +1736,8 @@ static int check_prep_vma(struct
> vma_remap_struct *vrm)
> > if (pgoff + (new_len >> PAGE_SHIFT) < pgoff)
> > return -EINVAL;
> >
> > - if (vma->vm_flags & (VM_DONTEXPAND | VM_PFNMAP))
> > + if (vma->vm_flags & (VM_DONTEXPAND | VM_PFNMAP) &&
> > + !vrm_implies_new_addr(vrm))
> > return -EFAULT;
>
> This is just incorrect. I mean firstly, as explained above, we simply do
> not support this kind of operation and cannot correctly.
>
> But also here you're allowing !(MREMAP_FIXED | MREMAP_DONTUNMAP)
> VMAs to
> expand VM_DONTEXPAND, VM_PFNMAP VMAs which is really really not
> right.
>
> The missing context here is:
>
> if (new_len == old_len)
> return 0;
>
> ... all code below involves an expansion of a VMA ...
>
> if (vma->vm_flags & (VM_DONTEXPAND | VM_PFNMAP))
> return -EFAULT;
>
> And making _all_ expands that aren't _definitely_ specifying a new address
> work correclty. But you are in fact 100% specifying a new address because
> you're setting input length of 0...
>
> So this allows _anybody_ to try to incorrectly expand VM_PFNMAP,
> VM_DONTEXPAND VMAs. So is wildly incorrect.
Is the problem here with VM_DONTEXPAND or VM_PFNMAP or both?
In other words, assuming the original VMA did not have VM_DONTEXPAND,
would this behavior (what I am trying to do with mremap) be acceptable?
>
> >
> > if (!mlock_future_ok(mm, vma->vm_flags, vrm->delta))
> > --
> > 2.50.1
> >
> >
>
> In any case, I don't want to try to support this behaviour, I don't think
> there's really a sensible way to be sure we can _copy_ page tables
> correctly, and even though the source size = 0 is a cute trick you _ARE_
> expanding, and VM_DONTEXPAND is very clear - do not do this.
So, the thing that is still not clear to me is whether cloning a VMA would be
considered a form of expansion (of the original VMA) forbidden by
VM_DONTEXPAND?
>
> Additionally, please if trying to make such fundamental changes going
> forward _always_ provide test code, it's absolutely a requirement.
Ok, will do. Just to confirm, your suggestion is to add a new mm selftest
to exercise the targeted codepath right?
Thanks,
Vivek
>
> Thanks, Lorenzo
next prev parent reply other threads:[~2025-11-21 6:51 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-20 5:35 Vivek Kasireddy
2025-11-20 9:04 ` Lorenzo Stoakes
2025-11-20 9:16 ` David Hildenbrand (Red Hat)
2025-11-20 9:35 ` Lorenzo Stoakes
2025-11-20 9:49 ` David Hildenbrand (Red Hat)
2025-11-20 9:58 ` Lorenzo Stoakes
2025-11-21 3:05 ` Akihiko Odaki
2025-11-21 8:03 ` Lorenzo Stoakes
2025-11-21 8:48 ` Akihiko Odaki
2025-11-21 9:10 ` Lorenzo Stoakes
2025-11-21 10:16 ` Akihiko Odaki
2025-11-21 10:52 ` Lorenzo Stoakes
2025-11-21 7:26 ` David Hildenbrand (Red Hat)
2025-11-21 6:51 ` Kasireddy, Vivek [this message]
2025-11-21 7:52 ` Lorenzo Stoakes
2025-11-21 8:13 ` David Hildenbrand (Red Hat)
2025-11-21 15:03 ` Liam R. Howlett
2025-11-22 6:56 ` Kasireddy, Vivek
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=IA0PR11MB71851184AA5977BBAFBD55C7F8D5A@IA0PR11MB7185.namprd11.prod.outlook.com \
--to=vivek.kasireddy@intel.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=david@kernel.org \
--cc=jannh@google.com \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=odaki@rsg.ci.i.u-tokyo.ac.jp \
--cc=pfalcato@suse.de \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox