From: Ryan Roberts <ryan.roberts@arm.com>
To: David Hildenbrand <david@redhat.com>,
Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
"Liam R . Howlett" <Liam.Howlett@oracle.com>,
Vlastimil Babka <vbabka@suse.cz>, Jann Horn <jannh@google.com>,
Pedro Falcato <pfalcato@suse.de>, Barry Song <baohua@kernel.org>,
Dev Jain <dev.jain@arm.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH HOTFIX 6.17] mm/mremap: avoid expensive folio lookup on mremap folio pte batch
Date: Fri, 8 Aug 2025 08:56:56 +0100 [thread overview]
Message-ID: <8391c672-1123-4499-8d28-a731f2d88a9e@arm.com> (raw)
In-Reply-To: <b0d257a4-a37d-41da-92f9-4d1c0a11c30c@redhat.com>
On 08/08/2025 08:45, David Hildenbrand wrote:
>>
>> Not sure if some sleep has changed your mind on what "hint" means? I'm pretty
>> sure David named this function, but for me the name makes sense. The arch is
>> saying "I know that the pte batch is at least N ptes. It's up to you if you use
>> that information. I'll still work correctly if you ignore it".
>
> The last one is the important bit I think.
>
>>
>> For me, your interpretation of 'the most number of PTEs that _might_ coalesce'
>> would be a guess, not a hint.
>
> I'm not a native speaker, so I'll let both of you figure that out. To me it
> makes sense as well ... but well, I was involved when creating that function. :)
>
>>
>>>
>>> I understand the con PTE bit is a 'hint' but as I recall you saying at
>>> LSF/MM 'modern CPUs take the hint'. Which presumably is where this comes
>>> from, but that's kinda deceptive.
>>>
>>> Anyway the reason I was emphatic here is on the basis that I believe I had
>>> this explained to met his way, which obviously I or whoever it was (don't
>>> recall) must have misunderstood. Or perhaps I hallucinated it... :)
>>
>> FWIW, this is the documentation for the function:
>>
>> /**
>> * pte_batch_hint - Number of pages that can be added to batch without scanning.
>> * @ptep: Page table pointer for the entry.
>> * @pte: Page table entry.
>> *
>> * Some architectures know that a set of contiguous ptes all map the same
>> * contiguous memory with the same permissions. In this case, it can provide a
>> * hint to aid pte batching without the core code needing to scan every pte.
>> *
>> * An architecture implementation may ignore the PTE accessed state. Further,
>> * the dirty state must apply atomically to all the PTEs described by the hint.
>> *
>> * May be overridden by the architecture, else pte_batch_hint is always 1.
>> */
>
> It's actually ... surprisingly good after reading it again after at least a year.
>
>>
>>>
>>> I see that folio_pte_batch() can get _more_, is this on the basis of there
>>> being adjacent, physically contiguous contPTE entries that can also be
>>> batched up?
>
> [...]
>
>>>>>
>>>>>>
>>>>>>
>>>>>> Not sure if that was discussed at some point before we went into the
>>>>>> direction of using folios. But there really doesn't seem to be anything
>>>>>> gained for other architectures here (as raised by Jann).
>>>>>
>>>>> Yup... I wonder about the other instances of this... ruh roh.
>>>>
>>>> IIRC prior to Dev's mprotect and mremap optimizations, I believe all sites
>>>> already needed the folio. I haven't actually looked at how mprotect ended up,
>>>> but maybe worth checking to see if it should protect with pte_batch_hint() too.
>>>
>>> mprotect didn't? I mean let's check.
>>
>> I think for mprotect, the folio was only previously needed for the numa case. I
>> have a vague memory that either Dev of I proposed wrapping folio_pte_batch() to
>> only get the folio and call it if the next PTE had an adjacent PFN (or something
>> like that). But it was deemed to complex. I might be misremembering... could
>> have been an internal conversation. I'll chat with Dev about it and revisit.
>>
>
> I am probably to blame here, because I think I rejected early to have arm64-only
> optimization, assuming other arch could benefit here as well with batching. But
> as it seems, batching in mremap() code really only serves the cont-pte managing
> code, and the folio_pte_batch() is really entirely unnecessary.
>
> In case of mprotect(), I think really only (a) NUMA and (b) anon-folio write-
> upgrade required the folio. So it's a bit more tricky than mremap() here
> where ... the folio is entirely irrelevant.
>
> One could detect the "anon write-upgrade possible" case early as well, and only
> lookup the folio in that case, otherwise use the straight pte hint.
>
> So I think there is some room for further improvement.
ACK; Dev, perhaps you can take another look at this and work up a patch to more
agressively avoid vm_normal_folio() for mprotect?
Thanks,
Ryan
next prev parent reply other threads:[~2025-08-08 7:57 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-07 18:58 Lorenzo Stoakes
2025-08-07 19:10 ` David Hildenbrand
2025-08-07 19:20 ` Lorenzo Stoakes
2025-08-07 19:41 ` David Hildenbrand
2025-08-07 20:11 ` Lorenzo Stoakes
2025-08-07 21:01 ` Lorenzo Stoakes
2025-08-07 19:56 ` Ryan Roberts
2025-08-07 20:58 ` Lorenzo Stoakes
2025-08-08 5:18 ` Dev Jain
2025-08-08 7:19 ` Ryan Roberts
2025-08-08 7:45 ` David Hildenbrand
2025-08-08 7:56 ` Ryan Roberts [this message]
2025-08-08 8:44 ` Dev Jain
2025-08-08 9:50 ` Lorenzo Stoakes
2025-08-08 9:45 ` Lorenzo Stoakes
2025-08-08 9:40 ` Lorenzo Stoakes
2025-08-07 19:14 ` Pedro Falcato
2025-08-07 19:22 ` Lorenzo Stoakes
2025-08-07 19:33 ` David Hildenbrand
2025-08-08 5:19 ` Dev Jain
2025-08-08 9:56 ` Vlastimil Babka
2025-08-11 2:40 ` Barry Song
2025-08-11 4:57 ` Lorenzo Stoakes
2025-08-11 6:52 ` Barry Song
2025-08-11 15:08 ` Lorenzo Stoakes
2025-08-11 15:19 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8391c672-1123-4499-8d28-a731f2d88a9e@arm.com \
--to=ryan.roberts@arm.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=baohua@kernel.org \
--cc=david@redhat.com \
--cc=dev.jain@arm.com \
--cc=jannh@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=pfalcato@suse.de \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox