linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "David Hildenbrand (Red Hat)" <david@kernel.org>
To: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Vivek Kasireddy <vivek.kasireddy@intel.com>,
	linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	"Liam R. Howlett" <Liam.Howlett@oracle.com>,
	Vlastimil Babka <vbabka@suse.cz>, Jann Horn <jannh@google.com>,
	Pedro Falcato <pfalcato@suse.de>,
	Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
Subject: Re: [PATCH] mm/mremap: allow VMAs with VM_DONTEXPAND|VM_PFNMAP when creating new mapping
Date: Fri, 21 Nov 2025 08:26:25 +0100	[thread overview]
Message-ID: <ba9ec539-b0de-4f5b-888c-bf0aeb08e274@kernel.org> (raw)
In-Reply-To: <75dc53b9-bcd3-4271-ba7e-2762bec36e3d@lucifer.local>

On 11/20/25 10:58, Lorenzo Stoakes wrote:
> On Thu, Nov 20, 2025 at 10:49:59AM +0100, David Hildenbrand (Red Hat) wrote:
>> On 11/20/25 10:35, Lorenzo Stoakes wrote:
>>> On Thu, Nov 20, 2025 at 10:16:26AM +0100, David Hildenbrand (Red Hat) wrote:
>>>> On 11/20/25 10:04, Lorenzo Stoakes wrote:
>>>>> Hi Vivek, thanks for the patch.
>>>>>
>>>>> In general though, let's please not make a fundamental change to mremap()
>>>>> behaviour in late -rc6. Late in cycle/during merge window we're really only
>>>>> interested in existing series, series that are less involved than this.
>>>>>
>>>>> On Wed, Nov 19, 2025 at 09:35:46PM -0800, Vivek Kasireddy wrote:
>>>>>> When mremap is used to create a new mapping, we should not return
>>>>>> -EFAULT for VMAs with VM_DONTEXPAND or VM_PFNMAP flags set because
>>>>>> the old VMA would neither be expanded nor shrunk in this case. This
>>>>>
>>>>> I guess you're trying to be succinct here and 'clone' each input VMA using
>>>>> the 0 source size input.
>>>>>
>>>>> However this can't work.
>>>>>
>>>>> This operation is not equivalent to an mmap(). It may seem to be for
>>>>> ordinary mappings but in practice it isn't:
>>>>>
>>>>> (syscall)
>>>>> -> do_mremap()
>>>>> -> mremap_at()
>>>>> -> expand_vma()
>>>>> -> move_vma()
>>>>> -> copy_vma_and_data()
>>>>> -> copy_vma()
>>>>>
>>>>> Essentially copying the properties of the VMA to the new region.
>>>>>
>>>>> But this doesn't work for PFN map.
>>>>>
>>>>> At _no point_ are you invoking the original f_op->mmap or
>>>>> f_op->mmap_prepare handler.
>>>>>
>>>>> And these handles for PFN maps set up page tables, because PFN maps
>>>>> literally do not exist as VMAs which have properties independent of their
>>>>> page tables like this.
>>>>
>>>> vfio-pci is a bit different, though, as it uses
>>>> vmf_insert_pfn()/vmf_insert_pfn_pmd()/vmf_insert_pfn_pud() at fault time to
>>>> insert PFNs, not at mmap time using remap_pfn_range() and friends.
>>>>
>>>> (see vfio_pci_mmap_page_fault() )
>>>
>>> It sets VM_DONTEXPAND but is fine with being expanded? :) That sounds like a
>>> bug there:
>>
>> Yeah, I am all confused about expansion. The example code looks like all it
>> wants to do is move a VM_PFNMAP mapping.
>>
>>           if (mremap(iov[i].iov_base, 0, iov[i].iov_len,
>>               MREMAP_FIXED | MREMAP_MAYMOVE, cur) == MAP_FAILED) {
>>               goto err;
>>           }
>>
>> I guess the expansion is because of iov[i].iov_len is bigger than the
>> original VMA?
>>
>> Is that maybe a bug in QEMU or why are we even expanding here?
> 
> We're going from size 0 to iov[i].iov_len, which is saying 'please make a copy
> of this VMA at a new address'.
> 
> There's never any moving, as input size is 0 :)

Ah, so it is indeed cloning. The cloning as part of a "remap" operation 
is really confusing.

> 
> It's a cute corner case way of using mremap().
> 
> We're basically asking for a _copy_. But you can't get a copy of a
> VM_DONTEXPAND/VM_PFNMAP because you need to invoke mmap_prepare (or legacy mmap)
> to get something sensible and you are bypassing that on expansion, even if it's
> a 'clone' style expansion.

Yes, agreed.

As Akihiko writes, what they want to achieve resemble a bit what fork() 
does. But there, the flow is rather different.

-- 
Cheers

David


  parent reply	other threads:[~2025-11-21  7:26 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-20  5:35 Vivek Kasireddy
2025-11-20  9:04 ` Lorenzo Stoakes
2025-11-20  9:16   ` David Hildenbrand (Red Hat)
2025-11-20  9:35     ` Lorenzo Stoakes
2025-11-20  9:49       ` David Hildenbrand (Red Hat)
2025-11-20  9:58         ` Lorenzo Stoakes
2025-11-21  3:05           ` Akihiko Odaki
2025-11-21  8:03             ` Lorenzo Stoakes
2025-11-21  8:48               ` Akihiko Odaki
2025-11-21  9:10                 ` Lorenzo Stoakes
2025-11-21 10:16                   ` Akihiko Odaki
2025-11-21 10:52                     ` Lorenzo Stoakes
2025-11-21  7:26           ` David Hildenbrand (Red Hat) [this message]
2025-11-21  6:51   ` Kasireddy, Vivek
2025-11-21  7:52     ` Lorenzo Stoakes
2025-11-21  8:13       ` David Hildenbrand (Red Hat)
2025-11-21 15:03         ` Liam R. Howlett
2025-11-22  6:56           ` Kasireddy, Vivek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ba9ec539-b0de-4f5b-888c-bf0aeb08e274@kernel.org \
    --to=david@kernel.org \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=jannh@google.com \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=odaki@rsg.ci.i.u-tokyo.ac.jp \
    --cc=pfalcato@suse.de \
    --cc=vbabka@suse.cz \
    --cc=vivek.kasireddy@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox