From: David Hildenbrand <david@redhat.com>
To: Ryan Roberts <ryan.roberts@arm.com>,
Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
"Liam R . Howlett" <Liam.Howlett@oracle.com>,
Vlastimil Babka <vbabka@suse.cz>, Jann Horn <jannh@google.com>,
Pedro Falcato <pfalcato@suse.de>, Barry Song <baohua@kernel.org>,
Dev Jain <dev.jain@arm.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH HOTFIX 6.17] mm/mremap: avoid expensive folio lookup on mremap folio pte batch
Date: Fri, 8 Aug 2025 09:45:39 +0200 [thread overview]
Message-ID: <b0d257a4-a37d-41da-92f9-4d1c0a11c30c@redhat.com> (raw)
In-Reply-To: <303b1764-6471-421f-b4c3-6a2585cee2ae@arm.com>
>
> Not sure if some sleep has changed your mind on what "hint" means? I'm pretty
> sure David named this function, but for me the name makes sense. The arch is
> saying "I know that the pte batch is at least N ptes. It's up to you if you use
> that information. I'll still work correctly if you ignore it".
The last one is the important bit I think.
>
> For me, your interpretation of 'the most number of PTEs that _might_ coalesce'
> would be a guess, not a hint.
I'm not a native speaker, so I'll let both of you figure that out. To me
it makes sense as well ... but well, I was involved when creating that
function. :)
>
>>
>> I understand the con PTE bit is a 'hint' but as I recall you saying at
>> LSF/MM 'modern CPUs take the hint'. Which presumably is where this comes
>> from, but that's kinda deceptive.
>>
>> Anyway the reason I was emphatic here is on the basis that I believe I had
>> this explained to met his way, which obviously I or whoever it was (don't
>> recall) must have misunderstood. Or perhaps I hallucinated it... :)
>
> FWIW, this is the documentation for the function:
>
> /**
> * pte_batch_hint - Number of pages that can be added to batch without scanning.
> * @ptep: Page table pointer for the entry.
> * @pte: Page table entry.
> *
> * Some architectures know that a set of contiguous ptes all map the same
> * contiguous memory with the same permissions. In this case, it can provide a
> * hint to aid pte batching without the core code needing to scan every pte.
> *
> * An architecture implementation may ignore the PTE accessed state. Further,
> * the dirty state must apply atomically to all the PTEs described by the hint.
> *
> * May be overridden by the architecture, else pte_batch_hint is always 1.
> */
It's actually ... surprisingly good after reading it again after at
least a year.
>
>>
>> I see that folio_pte_batch() can get _more_, is this on the basis of there
>> being adjacent, physically contiguous contPTE entries that can also be
>> batched up?
[...]
>>>>
>>>>>
>>>>>
>>>>> Not sure if that was discussed at some point before we went into the
>>>>> direction of using folios. But there really doesn't seem to be anything
>>>>> gained for other architectures here (as raised by Jann).
>>>>
>>>> Yup... I wonder about the other instances of this... ruh roh.
>>>
>>> IIRC prior to Dev's mprotect and mremap optimizations, I believe all sites
>>> already needed the folio. I haven't actually looked at how mprotect ended up,
>>> but maybe worth checking to see if it should protect with pte_batch_hint() too.
>>
>> mprotect didn't? I mean let's check.
>
> I think for mprotect, the folio was only previously needed for the numa case. I
> have a vague memory that either Dev of I proposed wrapping folio_pte_batch() to
> only get the folio and call it if the next PTE had an adjacent PFN (or something
> like that). But it was deemed to complex. I might be misremembering... could
> have been an internal conversation. I'll chat with Dev about it and revisit.
>
I am probably to blame here, because I think I rejected early to have
arm64-only optimization, assuming other arch could benefit here as well
with batching. But as it seems, batching in mremap() code really only
serves the cont-pte managing code, and the folio_pte_batch() is really
entirely unnecessary.
In case of mprotect(), I think really only (a) NUMA and (b) anon-folio
write-upgrade required the folio. So it's a bit more tricky than
mremap() here where ... the folio is entirely irrelevant.
One could detect the "anon write-upgrade possible" case early as well,
and only lookup the folio in that case, otherwise use the straight pte hint.
So I think there is some room for further improvement.
--
Cheers,
David / dhildenb
next prev parent reply other threads:[~2025-08-08 7:45 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-07 18:58 Lorenzo Stoakes
2025-08-07 19:10 ` David Hildenbrand
2025-08-07 19:20 ` Lorenzo Stoakes
2025-08-07 19:41 ` David Hildenbrand
2025-08-07 20:11 ` Lorenzo Stoakes
2025-08-07 21:01 ` Lorenzo Stoakes
2025-08-07 19:56 ` Ryan Roberts
2025-08-07 20:58 ` Lorenzo Stoakes
2025-08-08 5:18 ` Dev Jain
2025-08-08 7:19 ` Ryan Roberts
2025-08-08 7:45 ` David Hildenbrand [this message]
2025-08-08 7:56 ` Ryan Roberts
2025-08-08 8:44 ` Dev Jain
2025-08-08 9:50 ` Lorenzo Stoakes
2025-08-08 9:45 ` Lorenzo Stoakes
2025-08-08 9:40 ` Lorenzo Stoakes
2025-08-07 19:14 ` Pedro Falcato
2025-08-07 19:22 ` Lorenzo Stoakes
2025-08-07 19:33 ` David Hildenbrand
2025-08-08 5:19 ` Dev Jain
2025-08-08 9:56 ` Vlastimil Babka
2025-08-11 2:40 ` Barry Song
2025-08-11 4:57 ` Lorenzo Stoakes
2025-08-11 6:52 ` Barry Song
2025-08-11 15:08 ` Lorenzo Stoakes
2025-08-11 15:19 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b0d257a4-a37d-41da-92f9-4d1c0a11c30c@redhat.com \
--to=david@redhat.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=baohua@kernel.org \
--cc=dev.jain@arm.com \
--cc=jannh@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=pfalcato@suse.de \
--cc=ryan.roberts@arm.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox