linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Ryan Roberts <ryan.roberts@arm.com>
To: David Hildenbrand <david@redhat.com>,
	Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	"Liam R . Howlett" <Liam.Howlett@oracle.com>,
	Vlastimil Babka <vbabka@suse.cz>, Jann Horn <jannh@google.com>,
	Pedro Falcato <pfalcato@suse.de>, Barry Song <baohua@kernel.org>,
	Dev Jain <dev.jain@arm.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH HOTFIX 6.17] mm/mremap: avoid expensive folio lookup on mremap folio pte batch
Date: Fri, 8 Aug 2025 08:56:56 +0100	[thread overview]
Message-ID: <8391c672-1123-4499-8d28-a731f2d88a9e@arm.com> (raw)
In-Reply-To: <b0d257a4-a37d-41da-92f9-4d1c0a11c30c@redhat.com>

On 08/08/2025 08:45, David Hildenbrand wrote:
>>
>> Not sure if some sleep has changed your mind on what "hint" means? I'm pretty
>> sure David named this function, but for me the name makes sense. The arch is
>> saying "I know that the pte batch is at least N ptes. It's up to you if you use
>> that information. I'll still work correctly if you ignore it".
> 
> The last one is the important bit I think.
> 
>>
>> For me, your interpretation of 'the most number of PTEs that _might_ coalesce'
>> would be a guess, not a hint.
> 
> I'm not a native speaker, so I'll let both of you figure that out. To me it
> makes sense as well ... but well, I was involved when creating that function. :)
> 
>>
>>>
>>> I understand the con PTE bit is a 'hint' but as I recall you saying at
>>> LSF/MM 'modern CPUs take the hint'. Which presumably is where this comes
>>> from, but that's kinda deceptive.
>>>
>>> Anyway the reason I was emphatic here is on the basis that I believe I had
>>> this explained to met his way, which obviously I or whoever it was (don't
>>> recall) must have misunderstood. Or perhaps I hallucinated it... :)
>>
>> FWIW, this is the documentation for the function:
>>
>> /**
>>   * pte_batch_hint - Number of pages that can be added to batch without scanning.
>>   * @ptep: Page table pointer for the entry.
>>   * @pte: Page table entry.
>>   *
>>   * Some architectures know that a set of contiguous ptes all map the same
>>   * contiguous memory with the same permissions. In this case, it can provide a
>>   * hint to aid pte batching without the core code needing to scan every pte.
>>   *
>>   * An architecture implementation may ignore the PTE accessed state. Further,
>>   * the dirty state must apply atomically to all the PTEs described by the hint.
>>   *
>>   * May be overridden by the architecture, else pte_batch_hint is always 1.
>>   */
> 
> It's actually ... surprisingly good after reading it again after at least a year.
> 
>>
>>>
>>> I see that folio_pte_batch() can get _more_, is this on the basis of there
>>> being adjacent, physically contiguous contPTE entries that can also be
>>> batched up?
> 
> [...]
> 
>>>>>
>>>>>>
>>>>>>
>>>>>> Not sure if that was discussed at some point before we went into the
>>>>>> direction of using folios. But there really doesn't seem to be anything
>>>>>> gained for other architectures here (as raised by Jann).
>>>>>
>>>>> Yup... I wonder about the other instances of this... ruh roh.
>>>>
>>>> IIRC prior to Dev's mprotect and mremap optimizations, I believe all sites
>>>> already needed the folio. I haven't actually looked at how mprotect ended up,
>>>> but maybe worth checking to see if it should protect with pte_batch_hint() too.
>>>
>>> mprotect didn't? I mean let's check.
>>
>> I think for mprotect, the folio was only previously needed for the numa case. I
>> have a vague memory that either Dev of I proposed wrapping folio_pte_batch() to
>> only get the folio and call it if the next PTE had an adjacent PFN (or something
>> like that). But it was deemed to complex. I might be misremembering... could
>> have been an internal conversation. I'll chat with Dev about it and revisit.
>>
> 
> I am probably to blame here, because I think I rejected early to have arm64-only
> optimization, assuming other arch could benefit here as well with batching. But
> as it seems, batching in mremap() code really only serves the cont-pte managing
> code, and the folio_pte_batch() is really entirely unnecessary.
> 
> In case of mprotect(), I think really only (a) NUMA and (b) anon-folio write-
> upgrade required the folio. So it's a bit more tricky than mremap() here
> where ... the folio is entirely irrelevant.
> 
> One could detect the "anon write-upgrade possible" case early as well, and only
> lookup the folio in that case, otherwise use the straight pte hint.
> 
> So I think there is some room for further improvement.

ACK; Dev, perhaps you can take another look at this and work up a patch to more
agressively avoid vm_normal_folio() for mprotect?

Thanks,
Ryan



  reply	other threads:[~2025-08-08  7:57 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-07 18:58 Lorenzo Stoakes
2025-08-07 19:10 ` David Hildenbrand
2025-08-07 19:20   ` Lorenzo Stoakes
2025-08-07 19:41     ` David Hildenbrand
2025-08-07 20:11       ` Lorenzo Stoakes
2025-08-07 21:01         ` Lorenzo Stoakes
2025-08-07 19:56     ` Ryan Roberts
2025-08-07 20:58       ` Lorenzo Stoakes
2025-08-08  5:18         ` Dev Jain
2025-08-08  7:19         ` Ryan Roberts
2025-08-08  7:45           ` David Hildenbrand
2025-08-08  7:56             ` Ryan Roberts [this message]
2025-08-08  8:44               ` Dev Jain
2025-08-08  9:50                 ` Lorenzo Stoakes
2025-08-08  9:45             ` Lorenzo Stoakes
2025-08-08  9:40           ` Lorenzo Stoakes
2025-08-07 19:14 ` Pedro Falcato
2025-08-07 19:22   ` Lorenzo Stoakes
2025-08-07 19:33     ` David Hildenbrand
2025-08-08  5:19 ` Dev Jain
2025-08-08  9:56 ` Vlastimil Babka
2025-08-11  2:40 ` Barry Song
2025-08-11  4:57   ` Lorenzo Stoakes
2025-08-11  6:52     ` Barry Song
2025-08-11 15:08       ` Lorenzo Stoakes
2025-08-11 15:19         ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8391c672-1123-4499-8d28-a731f2d88a9e@arm.com \
    --to=ryan.roberts@arm.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=baohua@kernel.org \
    --cc=david@redhat.com \
    --cc=dev.jain@arm.com \
    --cc=jannh@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=pfalcato@suse.de \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox