Re: [PATCH HOTFIX 6.17] mm/mremap: avoid expensive folio lookup on mremap folio pte batch

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: David Hildenbrand <david@redhat.com>
To: Ryan Roberts <ryan.roberts@arm.com>,
	Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	"Liam R . Howlett" <Liam.Howlett@oracle.com>,
	Vlastimil Babka <vbabka@suse.cz>, Jann Horn <jannh@google.com>,
	Pedro Falcato <pfalcato@suse.de>, Barry Song <baohua@kernel.org>,
	Dev Jain <dev.jain@arm.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH HOTFIX 6.17] mm/mremap: avoid expensive folio lookup on mremap folio pte batch
Date: Fri, 8 Aug 2025 09:45:39 +0200	[thread overview]
Message-ID: <b0d257a4-a37d-41da-92f9-4d1c0a11c30c@redhat.com> (raw)
In-Reply-To: <303b1764-6471-421f-b4c3-6a2585cee2ae@arm.com>

> 
> Not sure if some sleep has changed your mind on what "hint" means? I'm pretty
> sure David named this function, but for me the name makes sense. The arch is
> saying "I know that the pte batch is at least N ptes. It's up to you if you use
> that information. I'll still work correctly if you ignore it".

The last one is the important bit I think.

> 
> For me, your interpretation of 'the most number of PTEs that _might_ coalesce'
> would be a guess, not a hint.

I'm not a native speaker, so I'll let both of you figure that out. To me 
it makes sense as well ... but well, I was involved when creating that 
function. :)

> 
>>
>> I understand the con PTE bit is a 'hint' but as I recall you saying at
>> LSF/MM 'modern CPUs take the hint'. Which presumably is where this comes
>> from, but that's kinda deceptive.
>>
>> Anyway the reason I was emphatic here is on the basis that I believe I had
>> this explained to met his way, which obviously I or whoever it was (don't
>> recall) must have misunderstood. Or perhaps I hallucinated it... :)
> 
> FWIW, this is the documentation for the function:
> 
> /**
>   * pte_batch_hint - Number of pages that can be added to batch without scanning.
>   * @ptep: Page table pointer for the entry.
>   * @pte: Page table entry.
>   *
>   * Some architectures know that a set of contiguous ptes all map the same
>   * contiguous memory with the same permissions. In this case, it can provide a
>   * hint to aid pte batching without the core code needing to scan every pte.
>   *
>   * An architecture implementation may ignore the PTE accessed state. Further,
>   * the dirty state must apply atomically to all the PTEs described by the hint.
>   *
>   * May be overridden by the architecture, else pte_batch_hint is always 1.
>   */

It's actually ... surprisingly good after reading it again after at 
least a year.

> 
>>
>> I see that folio_pte_batch() can get _more_, is this on the basis of there
>> being adjacent, physically contiguous contPTE entries that can also be
>> batched up?

[...]

>>>>
>>>>>
>>>>>
>>>>> Not sure if that was discussed at some point before we went into the
>>>>> direction of using folios. But there really doesn't seem to be anything
>>>>> gained for other architectures here (as raised by Jann).
>>>>
>>>> Yup... I wonder about the other instances of this... ruh roh.
>>>
>>> IIRC prior to Dev's mprotect and mremap optimizations, I believe all sites
>>> already needed the folio. I haven't actually looked at how mprotect ended up,
>>> but maybe worth checking to see if it should protect with pte_batch_hint() too.
>>
>> mprotect didn't? I mean let's check.
> 
> I think for mprotect, the folio was only previously needed for the numa case. I
> have a vague memory that either Dev of I proposed wrapping folio_pte_batch() to
> only get the folio and call it if the next PTE had an adjacent PFN (or something
> like that). But it was deemed to complex. I might be misremembering... could
> have been an internal conversation. I'll chat with Dev about it and revisit.
> 

I am probably to blame here, because I think I rejected early to have 
arm64-only optimization, assuming other arch could benefit here as well 
with batching. But as it seems, batching in mremap() code really only 
serves the cont-pte managing code, and the folio_pte_batch() is really 
entirely unnecessary.

In case of mprotect(), I think really only (a) NUMA and (b) anon-folio 
write-upgrade required the folio. So it's a bit more tricky than 
mremap() here where ... the folio is entirely irrelevant.

One could detect the "anon write-upgrade possible" case early as well, 
and only lookup the folio in that case, otherwise use the straight pte hint.

So I think there is some room for further improvement.

-- 
Cheers,

David / dhildenb

next prev parent reply	other threads:[~2025-08-08  7:45 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-07 18:58 Lorenzo Stoakes
2025-08-07 19:10 ` David Hildenbrand
2025-08-07 19:20   ` Lorenzo Stoakes
2025-08-07 19:41     ` David Hildenbrand
2025-08-07 20:11       ` Lorenzo Stoakes
2025-08-07 21:01         ` Lorenzo Stoakes
2025-08-07 19:56     ` Ryan Roberts
2025-08-07 20:58       ` Lorenzo Stoakes
2025-08-08  5:18         ` Dev Jain
2025-08-08  7:19         ` Ryan Roberts
2025-08-08  7:45           ` David Hildenbrand [this message]
2025-08-08  7:56             ` Ryan Roberts
2025-08-08  8:44               ` Dev Jain
2025-08-08  9:50                 ` Lorenzo Stoakes
2025-08-08  9:45             ` Lorenzo Stoakes
2025-08-08  9:40           ` Lorenzo Stoakes
2025-08-07 19:14 ` Pedro Falcato
2025-08-07 19:22   ` Lorenzo Stoakes
2025-08-07 19:33     ` David Hildenbrand
2025-08-08  5:19 ` Dev Jain
2025-08-08  9:56 ` Vlastimil Babka
2025-08-11  2:40 ` Barry Song
2025-08-11  4:57   ` Lorenzo Stoakes
2025-08-11  6:52     ` Barry Song
2025-08-11 15:08       ` Lorenzo Stoakes
2025-08-11 15:19         ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b0d257a4-a37d-41da-92f9-4d1c0a11c30c@redhat.com \
    --to=david@redhat.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=baohua@kernel.org \
    --cc=dev.jain@arm.com \
    --cc=jannh@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=pfalcato@suse.de \
    --cc=ryan.roberts@arm.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox