From: Ryan Roberts <ryan.roberts@arm.com>
To: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Dev Jain <dev.jain@arm.com>,
akpm@linux-foundation.org, david@redhat.com, willy@infradead.org,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
catalin.marinas@arm.com, will@kernel.org,
Liam.Howlett@oracle.com, vbabka@suse.cz, jannh@google.com,
anshuman.khandual@arm.com, peterx@redhat.com, joey.gouly@arm.com,
ioworker0@gmail.com, baohua@kernel.org, kevin.brodsky@arm.com,
quic_zhenhuah@quicinc.com, christophe.leroy@csgroup.eu,
yangyicong@hisilicon.com, linux-arm-kernel@lists.infradead.org,
hughd@google.com, yang@os.amperecomputing.com, ziy@nvidia.com
Subject: Re: [PATCH v4 3/4] mm: Optimize mprotect() by PTE-batching
Date: Tue, 1 Jul 2025 10:53:08 +0100 [thread overview]
Message-ID: <a2165422-0d78-4549-bc34-a8bbcdb6513f@arm.com> (raw)
In-Reply-To: <64c2ec85-87ed-4793-b0e9-a68eac1a8020@lucifer.local>
On 01/07/2025 09:51, Lorenzo Stoakes wrote:
> On Tue, Jul 01, 2025 at 09:30:51AM +0100, Ryan Roberts wrote:
>>>> In an ideal world we would flatten and just have mprotect_folio_pte_batch()
>>>> return the batch size considering all the relevant PTE bits AND the
>>>> AnonExclusive bit on the pages. IIRC one of Dev's earlier versions modified the
>>>> core folio_pte_batch() function to also look at the AnonExclusive bit, but I
>>>> really disliked changing that core function (I think others did too?).
>>>
>>> Yeah let's not change the core function.
>>>
>>> My suggestion is to have mprotect_folio_pte_batch() do this.
>>>
>>>>
>>>> So barring that approach, we are really only left with the batch and sub-batch
>>>> approach - although, yes, it could be abstracted more. We could maintain a
>>>> context struct that persists across all calls to mprotect_folio_pte_batch() and
>>>> it can use that to keep it's state to remember if we are in the middle of a
>>>> sub-batch and decide either to call folio_pte_batch() to get a new batch, or
>>>> call anon_exclusive_batch() to get the next sub-batch within the current batch.
>>>> But that started to feel overly abstracted to me.
>>>
>>> Having this nested batch/sub-batch loop really feels worse. You just get lost in
>>> the complexity here very easily.
>>>
>>> But i"m also not sure we need to maintain _that_ much state?
>>>
>>> We're already looping over all of the PTEs here, so abstracting _the entire
>>> loop_ and all the sub-batch stuff to another function, that is
>>> mprotect_folio_pte_batch() I think sensibly, so it handles this for you makes a
>>> ton of sense.
>>
>> So effectively turn mprotect_folio_pte_batch() into an iterator; have a struct
>> and a funtion to init the struct for the the number of ptes we want to iterate
>> over, then a per iteration function that progressively returns batches?
>
> Is that really necessary though?
>
> Idea is that mprotect_folio_pte_batch() returns the nr_ptes _taking into account
> the PAE stuff_.
The issue is the efficiency. Assuming we want to keep the PTE scan contained
within the core folio_pte_batch() function and we _don't_ want to add PAE
awareness to that function, then we have 2 separate, independent loops; one for
PTE scanning and the other for PAE scanning. If the first loop scans through ans
returns 512, but then the PAE scan returns 1, we return 1. If we don't remember
for the next time that we already determined we have a PTE batch of 512 (now
511) then we will rescan the 511 PTEs and potentially return 1 again due to PAE.
Then 510, then 509...
That feels inefficient to me. So I'd much rather just remember that we have a
batch of 512, then split into sub batches as needed for PAE compliance. Then we
only scan each PTE once and each struct page once.
But to achieve this, we either need to merge the 2 loops or we need to carry
state across function calls (i.e. like an iterator).
>
> Would this break anything?
It's not about breaking anything, it's about scanning efficiently. Perhaps you
don't think it's worth worrying about in practice?
>
> We might need to pass a flag to say 'don't account for this' for prot numa case.
Yep, another bool ;-)
>
>>
>> Then we just have a simple loop here that gets the next batch and processes it?
next prev parent reply other threads:[~2025-07-01 9:53 UTC|newest]
Thread overview: 61+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-28 11:34 [PATCH v4 0/4] Optimize mprotect() for large folios Dev Jain
2025-06-28 11:34 ` [PATCH v4 1/4] mm: Optimize mprotect() for MM_CP_PROT_NUMA by batch-skipping PTEs Dev Jain
2025-06-30 9:42 ` Ryan Roberts
2025-06-30 9:49 ` Dev Jain
2025-06-30 9:55 ` Ryan Roberts
2025-06-30 10:05 ` Dev Jain
2025-06-30 11:25 ` Lorenzo Stoakes
2025-06-30 11:39 ` Ryan Roberts
2025-06-30 11:53 ` Lorenzo Stoakes
2025-06-30 11:40 ` Dev Jain
2025-06-30 11:51 ` Lorenzo Stoakes
2025-06-30 11:56 ` Dev Jain
2025-07-02 9:37 ` Lorenzo Stoakes
2025-07-02 15:01 ` Dev Jain
2025-07-02 15:37 ` Lorenzo Stoakes
2025-06-28 11:34 ` [PATCH v4 2/4] mm: Add batched versions of ptep_modify_prot_start/commit Dev Jain
2025-06-30 10:10 ` Ryan Roberts
2025-06-30 10:17 ` Dev Jain
2025-06-30 10:35 ` Ryan Roberts
2025-06-30 10:42 ` Dev Jain
2025-06-30 12:57 ` Lorenzo Stoakes
2025-07-01 4:44 ` Dev Jain
2025-07-01 7:33 ` Ryan Roberts
2025-07-01 8:06 ` Lorenzo Stoakes
2025-07-01 8:23 ` Ryan Roberts
2025-07-01 8:34 ` Lorenzo Stoakes
2025-06-28 11:34 ` [PATCH v4 3/4] mm: Optimize mprotect() by PTE-batching Dev Jain
2025-06-28 12:39 ` Dev Jain
2025-06-30 10:31 ` Ryan Roberts
2025-06-30 11:21 ` Dev Jain
2025-06-30 11:47 ` Dev Jain
2025-06-30 11:50 ` Ryan Roberts
2025-06-30 11:53 ` Dev Jain
2025-07-01 5:47 ` Dev Jain
2025-07-01 7:39 ` Ryan Roberts
2025-06-30 12:52 ` Lorenzo Stoakes
2025-07-01 5:30 ` Dev Jain
[not found] ` <ec2c3f60-43e9-47d9-9058-49d608845200@arm.com>
2025-07-01 8:06 ` Dev Jain
2025-07-01 8:24 ` Ryan Roberts
2025-07-01 8:15 ` Lorenzo Stoakes
2025-07-01 8:30 ` Ryan Roberts
2025-07-01 8:51 ` Lorenzo Stoakes
2025-07-01 9:53 ` Ryan Roberts [this message]
2025-07-01 10:21 ` Lorenzo Stoakes
2025-07-01 11:31 ` Ryan Roberts
2025-07-01 13:40 ` Lorenzo Stoakes
2025-07-02 10:32 ` Lorenzo Stoakes
2025-07-02 15:03 ` Dev Jain
2025-07-02 15:22 ` Lorenzo Stoakes
2025-07-03 12:59 ` David Hildenbrand
2025-06-28 11:34 ` [PATCH v4 4/4] arm64: Add batched versions of ptep_modify_prot_start/commit Dev Jain
2025-06-30 10:43 ` Ryan Roberts
2025-06-29 23:05 ` [PATCH v4 0/4] Optimize mprotect() for large folios Andrew Morton
2025-06-30 3:33 ` Dev Jain
2025-06-30 10:45 ` Ryan Roberts
2025-06-30 11:22 ` Dev Jain
2025-06-30 11:17 ` Lorenzo Stoakes
2025-06-30 11:25 ` Dev Jain
2025-06-30 11:27 ` Lorenzo Stoakes
2025-06-30 11:43 ` Dev Jain
2025-07-01 0:08 ` Andrew Morton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a2165422-0d78-4549-bc34-a8bbcdb6513f@arm.com \
--to=ryan.roberts@arm.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=anshuman.khandual@arm.com \
--cc=baohua@kernel.org \
--cc=catalin.marinas@arm.com \
--cc=christophe.leroy@csgroup.eu \
--cc=david@redhat.com \
--cc=dev.jain@arm.com \
--cc=hughd@google.com \
--cc=ioworker0@gmail.com \
--cc=jannh@google.com \
--cc=joey.gouly@arm.com \
--cc=kevin.brodsky@arm.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=peterx@redhat.com \
--cc=quic_zhenhuah@quicinc.com \
--cc=vbabka@suse.cz \
--cc=will@kernel.org \
--cc=willy@infradead.org \
--cc=yang@os.amperecomputing.com \
--cc=yangyicong@hisilicon.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox