linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm/mremap: Honour writable bit in mremap pte batching
@ 2025-10-28  6:39 Dev Jain
  2025-10-28 11:48 ` Pedro Falcato
  2025-10-28 16:41 ` Lorenzo Stoakes
  0 siblings, 2 replies; 4+ messages in thread
From: Dev Jain @ 2025-10-28  6:39 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dev Jain, stable, David Hildenbrand, Andrew Morton,
	Liam R. Howlett, Lorenzo Stoakes, Vlastimil Babka, Jann Horn,
	Pedro Falcato, Barry Song, open list:MEMORY MAPPING

Currently mremap folio pte batch ignores the writable bit during figuring
out a set of similar ptes mapping the same folio. Suppose that the first
pte of the batch is writable while the others are not - set_ptes will
end up setting the writable bit on the other ptes, which is a violation
of mremap semantics. Therefore, use FPB_RESPECT_WRITE to check the writable
bit while determining the pte batch.

Cc: stable@vger.kernel.org #6.17
Fixes: f822a9a81a31 ("mm: optimize mremap() by PTE batching")
Reported-by: David Hildenbrand <david@redhat.com>
Debugged-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Dev Jain <dev.jain@arm.com>
---
mm-selftests pass. Based on mm-new. Need David H. to confirm whether
the repro passes.

 mm/mremap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/mremap.c b/mm/mremap.c
index a7f531c17b79..8ad06cf50783 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -187,7 +187,7 @@ static int mremap_folio_pte_batch(struct vm_area_struct *vma, unsigned long addr
 	if (!folio || !folio_test_large(folio))
 		return 1;
 
-	return folio_pte_batch(folio, ptep, pte, max_nr);
+	return folio_pte_batch_flags(folio, NULL, ptep, &pte, max_nr, FPB_RESPECT_WRITE);
 }
 
 static int move_ptes(struct pagetable_move_control *pmc,
-- 
2.30.2



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] mm/mremap: Honour writable bit in mremap pte batching
  2025-10-28  6:39 [PATCH] mm/mremap: Honour writable bit in mremap pte batching Dev Jain
@ 2025-10-28 11:48 ` Pedro Falcato
  2025-10-28 12:30   ` David Hildenbrand
  2025-10-28 16:41 ` Lorenzo Stoakes
  1 sibling, 1 reply; 4+ messages in thread
From: Pedro Falcato @ 2025-10-28 11:48 UTC (permalink / raw)
  To: Dev Jain
  Cc: linux-kernel, stable, David Hildenbrand, Andrew Morton,
	Liam R. Howlett, Lorenzo Stoakes, Vlastimil Babka, Jann Horn,
	Barry Song, open list:MEMORY MAPPING

On Tue, Oct 28, 2025 at 12:09:52PM +0530, Dev Jain wrote:
> Currently mremap folio pte batch ignores the writable bit during figuring
> out a set of similar ptes mapping the same folio. Suppose that the first
> pte of the batch is writable while the others are not - set_ptes will
> end up setting the writable bit on the other ptes, which is a violation
> of mremap semantics. Therefore, use FPB_RESPECT_WRITE to check the writable
> bit while determining the pte batch.
>

Hmm, it seems to be like we're doing the wrong thing by default here?
I must admit I haven't followed the contpte work as much as I would've
liked, but it doesn't make much sense to me why FPB_RESPECT_WRITE would
be an option you have to explicitly pass, and where folio_pte_batch (the
"simple" interface) doesn't Just Do The Right Thing for naive callers.

Auditing all callers:
 - khugepaged clears a variable number of ptes
 - memory.c clears a variable number of ptes
 - mempolicy.c grabs folios for migrations
 - mlock.c steps over nr_ptes - 1 ptes, speeding up traversal
 - mremap is borked since we're remapping nr_ptes ptes
 - rmap.c TTU unmaps nr_ptes ptes for a given folio

 so while the vast majority of callers don't seem to care, it would make
 sense that folio_pte_batch() works conservatively by default, and
 folio_pte_batch_flags() would allow for further batching (or maybe
 we would add a separate folio_pte_batch_clear() or
 folio_pte_batch_greedy() or whatnot).

> Cc: stable@vger.kernel.org #6.17
> Fixes: f822a9a81a31 ("mm: optimize mremap() by PTE batching")
> Reported-by: David Hildenbrand <david@redhat.com>
> Debugged-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Dev Jain <dev.jain@arm.com>

But the solution itself looks okay to me. so, fwiw:

Acked-by: Pedro Falcato <pfalcato@suse.de>

-- 
Pedro


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] mm/mremap: Honour writable bit in mremap pte batching
  2025-10-28 11:48 ` Pedro Falcato
@ 2025-10-28 12:30   ` David Hildenbrand
  0 siblings, 0 replies; 4+ messages in thread
From: David Hildenbrand @ 2025-10-28 12:30 UTC (permalink / raw)
  To: Pedro Falcato, Dev Jain
  Cc: linux-kernel, stable, Andrew Morton, Liam R. Howlett,
	Lorenzo Stoakes, Vlastimil Babka, Jann Horn, Barry Song,
	open list:MEMORY MAPPING

On 28.10.25 12:48, Pedro Falcato wrote:
> On Tue, Oct 28, 2025 at 12:09:52PM +0530, Dev Jain wrote:
>> Currently mremap folio pte batch ignores the writable bit during figuring
>> out a set of similar ptes mapping the same folio. Suppose that the first
>> pte of the batch is writable while the others are not - set_ptes will
>> end up setting the writable bit on the other ptes, which is a violation
>> of mremap semantics. Therefore, use FPB_RESPECT_WRITE to check the writable
>> bit while determining the pte batch.
>>
> 
> Hmm, it seems to be like we're doing the wrong thing by default here?
> I must admit I haven't followed the contpte work as much as I would've
> liked, but it doesn't make much sense to me why FPB_RESPECT_WRITE would
> be an option you have to explicitly pass, and where folio_pte_batch (the
> "simple" interface) doesn't Just Do The Right Thing for naive callers.

We use the "simple" version to apply to as many callers as possible: the 
common case, not some "let's be super careful" scenarios.

> 
> Auditing all callers:
>   - khugepaged clears a variable number of ptes
>   - memory.c clears a variable number of ptes
>   - mempolicy.c grabs folios for migrations
>   - mlock.c steps over nr_ptes - 1 ptes, speeding up traversal
>   - mremap is borked since we're remapping nr_ptes ptes
>   - rmap.c TTU unmaps nr_ptes ptes for a given folio
> 
>   so while the vast majority of callers don't seem to care, it would make
>   sense that folio_pte_batch() works conservatively by default, and
>   folio_pte_batch_flags() would allow for further batching (or maybe
>   we would add a separate folio_pte_batch_clear() or
>   folio_pte_batch_greedy() or whatnot).

I think really the tricky part is when we'e not only scanning or 
clearing, but actually want to "set" ptes again based on the result, 
like we do here.

For that we could consider having a second variant. But if it ends up 
having only a single caller, it's also not that great.

> 
>> Cc: stable@vger.kernel.org #6.17
>> Fixes: f822a9a81a31 ("mm: optimize mremap() by PTE batching")
>> Reported-by: David Hildenbrand <david@redhat.com>
>> Debugged-by: David Hildenbrand <david@redhat.com>
>> Signed-off-by: Dev Jain <dev.jain@arm.com>
> 
> But the solution itself looks okay to me. so, fwiw:
> 
> Acked-by: Pedro Falcato <pfalcato@suse.de>
> 

Backport might end up being a bit tricky I suspect.

Acked-by: David Hildenbrand <david@redhat.com>

-- 
Cheers

David / dhildenb



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] mm/mremap: Honour writable bit in mremap pte batching
  2025-10-28  6:39 [PATCH] mm/mremap: Honour writable bit in mremap pte batching Dev Jain
  2025-10-28 11:48 ` Pedro Falcato
@ 2025-10-28 16:41 ` Lorenzo Stoakes
  1 sibling, 0 replies; 4+ messages in thread
From: Lorenzo Stoakes @ 2025-10-28 16:41 UTC (permalink / raw)
  To: Dev Jain
  Cc: linux-kernel, stable, David Hildenbrand, Andrew Morton,
	Liam R. Howlett, Vlastimil Babka, Jann Horn, Pedro Falcato,
	Barry Song, open list:MEMORY MAPPING

You'll probably get a tag just from using the British English spelling of
'honour' from me :P (joking! ;)

On Tue, Oct 28, 2025 at 12:09:52PM +0530, Dev Jain wrote:
> Currently mremap folio pte batch ignores the writable bit during figuring
> out a set of similar ptes mapping the same folio. Suppose that the first
> pte of the batch is writable while the others are not - set_ptes will
> end up setting the writable bit on the other ptes, which is a violation
> of mremap semantics. Therefore, use FPB_RESPECT_WRITE to check the writable
> bit while determining the pte batch.

Yikes.

>
> Cc: stable@vger.kernel.org #6.17
> Fixes: f822a9a81a31 ("mm: optimize mremap() by PTE batching")
> Reported-by: David Hildenbrand <david@redhat.com>
> Debugged-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Dev Jain <dev.jain@arm.com>

LGTM, so:

Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>

> ---
> mm-selftests pass. Based on mm-new. Need David H. to confirm whether
> the repro passes.

Given he A-b'd I assume it did :)

>
>  mm/mremap.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/mremap.c b/mm/mremap.c
> index a7f531c17b79..8ad06cf50783 100644
> --- a/mm/mremap.c
> +++ b/mm/mremap.c
> @@ -187,7 +187,7 @@ static int mremap_folio_pte_batch(struct vm_area_struct *vma, unsigned long addr
>  	if (!folio || !folio_test_large(folio))
>  		return 1;
>
> -	return folio_pte_batch(folio, ptep, pte, max_nr);
> +	return folio_pte_batch_flags(folio, NULL, ptep, &pte, max_nr, FPB_RESPECT_WRITE);
>  }
>
>  static int move_ptes(struct pagetable_move_control *pmc,
> --
> 2.30.2
>


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-10-28 16:41 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-10-28  6:39 [PATCH] mm/mremap: Honour writable bit in mremap pte batching Dev Jain
2025-10-28 11:48 ` Pedro Falcato
2025-10-28 12:30   ` David Hildenbrand
2025-10-28 16:41 ` Lorenzo Stoakes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox