Re: [REGRESSION] mm/mprotect: 2x+ slowdown for >=400KiB regions since PTE batching (cac1db8c3aad)

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Dev Jain <dev.jain@arm.com>
To: Pedro Falcato <pfalcato@suse.de>
Cc: Luke Yang <luyang@redhat.com>,
	david@kernel.org, surenb@google.com, jhladky@redhat.com,
	akpm@linux-foundation.org, Liam.Howlett@oracle.com,
	willy@infradead.org, vbabka@suse.cz, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: Re: [REGRESSION] mm/mprotect: 2x+ slowdown for >=400KiB regions since PTE batching (cac1db8c3aad)
Date: Wed, 18 Feb 2026 16:08:11 +0530	[thread overview]
Message-ID: <eaa6be47-f1fc-4b88-b267-5aa38e3ba2a9@arm.com> (raw)
In-Reply-To: <5dso4ctke4baz7hky62zyfdzyg27tcikdbg5ecnrqmnluvmxzo@sciiqgatpqqv>


On 18/02/26 3:36 pm, Pedro Falcato wrote:
> On Wed, Feb 18, 2026 at 10:31:19AM +0530, Dev Jain wrote:
>> On 17/02/26 11:38 pm, Pedro Falcato wrote:
>>> On Tue, Feb 17, 2026 at 12:43:38PM -0500, Luke Yang wrote:
>>>> On Mon, Feb 16, 2026 at 03:42:08PM +0530, Dev Jain wrote:
>>>>> On 13/02/26 10:56 pm, David Hildenbrand (Arm) wrote:
>>>>>> On 2/13/26 18:16, Suren Baghdasaryan wrote:
>>>>>>> On Fri, Feb 13, 2026 at 4:24 PM Pedro Falcato <pfalcato@suse.de> wrote:
>>>>>>>> On Fri, Feb 13, 2026 at 04:47:29PM +0100, David Hildenbrand (Arm) wrote:
>>>>>>>>> Hi!
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Micro-benchmark results are nice. But what is the real word impact?
>>>>>>>>> IOW, why
>>>>>>>>> should we care?
>>>>>>>> Well, mprotect is widely used in thread spawning, code JITting,
>>>>>>>> and even process startup. And we don't want to pay for a feature we can't
>>>>>>>> even use (on x86).
>>>>>>> I agree. When I straced Android's zygote a while ago, mprotect() came
>>>>>>> up #30 in the list of most frequently used syscalls and one of the
>>>>>>> most used mm-related syscalls due to its use during process creation.
>>>>>>> However, I don't know how often it's used on VMAs of size >=400KiB.
>>>>>> See my point? :) If this is apparently so widespread then finding a real
>>>>>> reproducer is likely not a problem. Otherwise it's just speculation.
>>>>>>
>>>>>> It would also be interesting to know whether the reproducer ran with any
>>>>>> sort of mTHP enabled or not. 
>>>>> Yes. Luke, can you experiment with the following microbenchmark:
>>>>>
>>>>> https://pastebin.com/3hNtYirT
>>>>>
>>>>> and see if there is an optimization for pte-mapped 2M folios, before and
>>>>> after the commit?
>>>>>
>>>>> (set transparent_hugepages/enabled=always, hugepages-2048Kb/enabled=always)
>>> Since you're testing stuff, could you please test the changes in:
>>> https://github.com/heatd/linux/tree/mprotect-opt ?
>>>
>>> Not posting them yet since merge window, etc. Plus I think there's some
>>> further optimization work we can pull off.
>>>
>>> With the benchmark in https://gist.github.com/heatd/25eb2edb601719d22bfb514bcf06a132
>>> (compiled with g++ -O2 file.cpp -lbenchmark, needs google/benchmark) I've measured
>>> about an 18% speedup between original vs with patches.
>> Thanks for working on this. Some comments -
>>
>> 1. Rejecting batching with pte_batch_hint() means that we also don't batch 16K and 32K large
>> folios on arm64, since the cont bit is on starting only at 64K. Not sure how imp this is.
> I don't understand what you mean. Is ARM64 doing large folio optimization,
> even when there's no special MMU support for it (the aforementioned 16K and
> 32K cases)? If so, perhaps it's time for a ARCH_SUPPORTS_PTE_BATCHING flag.
> Though if you could provide numbers in that case it would be much appreciated.

There are two things at play here:

1. All arches are expected to benefit from pte batching on large folios, because
of doing similar operations together in one shot. For code paths except mprotect
and mremap, that benefit is far more clear due to:

a) batching across atomic operations etc. For example, see copy_present_ptes -> folio_ref_add.
   Instead of bumping the reference by 1 nr times, we bump it by nr in one shot.

b) vm_normal_folio was already being invoked. So, all in all the only new overhead
   we introduce is of folio_pte_batch(_flags). In fact, since we already have the
   folio, I recall that we even just special case the large folio case, out from
   the small folio case. Thus 4K folio processing will have no overhead.

2. Due to the requirements of contpte, ptep_get() on arm64 needs to fetch a/d bits
across a cont block. Thus, for each ptep_get, it does 16 pte accesses. To avoid this,
it becomes critical to batch on arm64.


>
>> 2. Did you measure if there is an optimization due to just the first commit ("prefetch the next pte")?
> Yes, I could measure a sizeable improvement (perhaps some 5%). I tested on
> zen5 (which is a pretty beefy uarch) and the loop is so full of ~~crap~~
> features that the prefetcher seems to be doing a poor job, at least per my
> results.

Nice.

>
>> I actually had prefetch in mind - is it possible to do some kind of prefetch(pfn_to_page(pte_pfn(pte)))
>> to optimize the call to vm_normal_folio()?
> Certainly possible, but I suspect it doesn't make too much sense. You want to
> avoid bringing in the cacheline if possible. In the pte's case, I know we're
> probably going to look at it and modify it, and if I'm wrong it's just one
> cacheline we misprefetched (though I had some parallel convos and it might
> be that we need a branch there to avoid prefetching out of the PTE table).
> We would like to avoid bringing in the folio cacheline at all, even if we
> don't stall through some fancy prefetching or sheer CPU magic.

I dunno, need other opinions.

The question here becomes that - should we prefer performance on 4K folios or
large folios? As Luke reports in the other email, the benefit on pte-mapped-thp
was staggering.

I believe that if the sysadmin is enabling CONFIG_TRANSPARENT_HUGEPAGE, they know
that the kernel will contain code which incorporates this fact that it will see
large folios. So, is it reasonable to penalize folio order-0 case, in preference
to folio order > 0? If yes, we can simply stop batching if !IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE).

>

next prev parent reply	other threads:[~2026-02-18 10:38 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-13 15:08 Luke Yang
2026-02-13 15:47 ` David Hildenbrand (Arm)
2026-02-13 16:24   ` Pedro Falcato
2026-02-13 17:16     ` Suren Baghdasaryan
2026-02-13 17:26       ` David Hildenbrand (Arm)
2026-02-16 10:12         ` Dev Jain
2026-02-16 14:56           ` Pedro Falcato
2026-02-17 17:43           ` Luke Yang
2026-02-17 18:08             ` Pedro Falcato
2026-02-18  5:01               ` Dev Jain
2026-02-18 10:06                 ` Pedro Falcato
2026-02-18 10:38                   ` Dev Jain [this message]
2026-02-18 10:46                     ` David Hildenbrand (Arm)
2026-02-18 11:58                       ` Pedro Falcato
2026-02-18 12:24                         ` David Hildenbrand (Arm)
2026-02-19 12:15                           ` Pedro Falcato
2026-02-19 13:02                             ` David Hildenbrand (Arm)
2026-02-19 15:00                               ` Pedro Falcato
2026-02-19 15:29                                 ` David Hildenbrand (Arm)
2026-02-20  4:12                                 ` Dev Jain
2026-02-18 11:52                     ` Pedro Falcato
2026-02-18  4:50             ` Dev Jain
2026-02-18 13:29 ` David Hildenbrand (Arm)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=eaa6be47-f1fc-4b88-b267-5aa38e3ba2a9@arm.com \
    --to=dev.jain@arm.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@kernel.org \
    --cc=jhladky@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luyang@redhat.com \
    --cc=pfalcato@suse.de \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox