linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Ryan Roberts <ryan.roberts@arm.com>
To: Dev Jain <dev.jain@arm.com>, akpm@linux-foundation.org
Cc: david@redhat.com, willy@infradead.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, catalin.marinas@arm.com,
	will@kernel.org, Liam.Howlett@oracle.com,
	lorenzo.stoakes@oracle.com, vbabka@suse.cz, jannh@google.com,
	anshuman.khandual@arm.com, peterx@redhat.com, joey.gouly@arm.com,
	ioworker0@gmail.com, baohua@kernel.org, kevin.brodsky@arm.com,
	quic_zhenhuah@quicinc.com, christophe.leroy@csgroup.eu,
	yangyicong@hisilicon.com, linux-arm-kernel@lists.infradead.org,
	hughd@google.com, yang@os.amperecomputing.com, ziy@nvidia.com
Subject: Re: [PATCH v4 2/4] mm: Add batched versions of ptep_modify_prot_start/commit
Date: Mon, 30 Jun 2025 11:35:12 +0100	[thread overview]
Message-ID: <ced934c3-a1ea-4d1c-954a-613eb20a9105@arm.com> (raw)
In-Reply-To: <e1ce9e4e-03e2-4133-bbf6-9e0533dd13b1@arm.com>

On 30/06/2025 11:17, Dev Jain wrote:
> 
> On 30/06/25 3:40 pm, Ryan Roberts wrote:
>> On 28/06/2025 12:34, Dev Jain wrote:
>>> Batch ptep_modify_prot_start/commit in preparation for optimizing mprotect.
>>> Architecture can override these helpers; in case not, they are implemented
>>> as a simple loop over the corresponding single pte helpers.
>>>
>>> Signed-off-by: Dev Jain <dev.jain@arm.com>
>>> ---
>>>   include/linux/pgtable.h | 83 ++++++++++++++++++++++++++++++++++++++++-
>>>   mm/mprotect.c           |  4 +-
>>>   2 files changed, 84 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
>>> index cf1515c163e2..662f39e7475a 100644
>>> --- a/include/linux/pgtable.h
>>> +++ b/include/linux/pgtable.h
>>> @@ -1331,7 +1331,8 @@ static inline pte_t ptep_modify_prot_start(struct
>>> vm_area_struct *vma,
>>>     /*
>>>    * Commit an update to a pte, leaving any hardware-controlled bits in
>>> - * the PTE unmodified.
>>> + * the PTE unmodified. The pte may have been "upgraded" w.r.t a/d bits compared
>>> + * to the old_pte, as in, it may have a/d bits on which were off in old_pte.
>> I find this last sentance a bit confusing. I think what you are trying to say is
>> somehthing like:
>>
>> """
>> old_pte is the value returned from ptep_modify_prot_start() but may additionally
>> have have young and/or dirty bits set where previously they were not.
>> """
> 
> Thanks.
> 
>> ?
>>
>>>    */
>>>   static inline void ptep_modify_prot_commit(struct vm_area_struct *vma,
>>>                          unsigned long addr,
>>> @@ -1340,6 +1341,86 @@ static inline void ptep_modify_prot_commit(struct
>>> vm_area_struct *vma,
>>>       __ptep_modify_prot_commit(vma, addr, ptep, pte);
>>>   }
>>>   #endif /* __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION */
>>> +
>>> +/**
>>> + * modify_prot_start_ptes - Start a pte protection read-modify-write
>>> transaction
>>> + * over a batch of ptes, which protects against asynchronous hardware
>>> + * modifications to the ptes. The intention is not to prevent the hardware from
>>> + * making pte updates, but to prevent any updates it may make from being lost.
>>> + * Please see the comment above ptep_modify_prot_start() for full description.
>>> + *
>>> + * @vma: The virtual memory area the pages are mapped into.
>>> + * @addr: Address the first page is mapped at.
>>> + * @ptep: Page table pointer for the first entry.
>>> + * @nr: Number of entries.
>>> + *
>>> + * May be overridden by the architecture; otherwise, implemented as a simple
>>> + * loop over ptep_modify_prot_start(), collecting the a/d bits from each pte
>>> + * in the batch.
>>> + *
>>> + * Note that PTE bits in the PTE batch besides the PFN can differ.
>>> + *
>>> + * Context: The caller holds the page table lock.  The PTEs map consecutive
>>> + * pages that belong to the same folio.  The PTEs are all in the same PMD.
>>> + * Since the batch is determined from folio_pte_batch, the PTEs must differ
>>> + * only in a/d bits (and the soft dirty bit; see fpb_t flags in
>>> + * mprotect_folio_pte_batch()).
>> This last sentence is confusing... You had previous said the PFN can differ, but
>> here you imply on a, d and sd bits are allowed to differ.
> 
> Forgot to mention the PFNs, kind of took them as implied. So mentioning the PFNs
> also will do or do you suggest a better wording?

Perhaps:

"""
Context: The caller holds the page table lock.  The PTEs map consecutive
pages that belong to the same folio.  All other PTE bits must be identical for
all PTEs in the batch except for young and dirty bits.  The PTEs are all in the
same PMD.
"""

You mention the soft dirty bit not needing to be the same in your current
wording, but I don't think that is correct? soft dirty needs to be the same, right?

> 
>>
>>> + */
>>> +#ifndef modify_prot_start_ptes
>>> +static inline pte_t modify_prot_start_ptes(struct vm_area_struct *vma,
>>> +        unsigned long addr, pte_t *ptep, unsigned int nr)
>>> +{
>>> +    pte_t pte, tmp_pte;
>>> +
>>> +    pte = ptep_modify_prot_start(vma, addr, ptep);
>>> +    while (--nr) {
>>> +        ptep++;
>>> +        addr += PAGE_SIZE;
>>> +        tmp_pte = ptep_modify_prot_start(vma, addr, ptep);
>>> +        if (pte_dirty(tmp_pte))
>>> +            pte = pte_mkdirty(pte);
>>> +        if (pte_young(tmp_pte))
>>> +            pte = pte_mkyoung(pte);
>>> +    }
>>> +    return pte;
>>> +}
>>> +#endif
>>> +
>>> +/**
>>> + * modify_prot_commit_ptes - Commit an update to a batch of ptes, leaving any
>>> + * hardware-controlled bits in the PTE unmodified.
>>> + *
>>> + * @vma: The virtual memory area the pages are mapped into.
>>> + * @addr: Address the first page is mapped at.
>>> + * @ptep: Page table pointer for the first entry.
>>> + * @old_pte: Old page table entry (for the first entry) which is now cleared.
>>> + * @pte: New page table entry to be set.
>>> + * @nr: Number of entries.
>>> + *
>>> + * May be overridden by the architecture; otherwise, implemented as a simple
>>> + * loop over ptep_modify_prot_commit().
>>> + *
>>> + * Context: The caller holds the page table lock. The PTEs are all in the same
>>> + * PMD. On exit, the set ptes in the batch map the same folio. The pte may have
>>> + * been "upgraded" w.r.t a/d bits compared to the old_pte, as in, it may have
>>> + * a/d bits on which were off in old_pte.
>> Same comment as for ptep_modify_prot_start().
>>
>>> + */
>>> +#ifndef modify_prot_commit_ptes
>>> +static inline void modify_prot_commit_ptes(struct vm_area_struct *vma,
>>> unsigned long addr,
>>> +        pte_t *ptep, pte_t old_pte, pte_t pte, unsigned int nr)
>>> +{
>>> +    int i;
>>> +
>>> +    for (i = 0; i < nr; ++i) {
>>> +        ptep_modify_prot_commit(vma, addr, ptep, old_pte, pte);
>>> +        ptep++;
>>> +        addr += PAGE_SIZE;
>>> +        old_pte = pte_next_pfn(old_pte);
>>> +        pte = pte_next_pfn(pte);
>>> +    }
>>> +}
>>> +#endif
>>> +
>>>   #endif /* CONFIG_MMU */
>>>     /*
>>> diff --git a/mm/mprotect.c b/mm/mprotect.c
>>> index af10a7fbe6b8..627b0d67cc4a 100644
>>> --- a/mm/mprotect.c
>>> +++ b/mm/mprotect.c
>>> @@ -206,7 +206,7 @@ static long change_pte_range(struct mmu_gather *tlb,
>>>                       continue;
>>>               }
>>>   -            oldpte = ptep_modify_prot_start(vma, addr, pte);
>>> +            oldpte = modify_prot_start_ptes(vma, addr, pte, nr_ptes);
>> You're calling this with nr_ptes = 0 for the prot_numa case. But the
>> implementation expects minimum nr_ptes == 1.
> 
> This will get fixed when I force nr_ptes = 1 in the previous patch right?

Yep, just pointing it out.

> 
>>
>>>               ptent = pte_modify(oldpte, newprot);
>>>                 if (uffd_wp)
>>> @@ -232,7 +232,7 @@ static long change_pte_range(struct mmu_gather *tlb,
>>>                   can_change_pte_writable(vma, addr, ptent))
>>>                   ptent = pte_mkwrite(ptent, vma);
>>>   -            ptep_modify_prot_commit(vma, addr, pte, oldpte, ptent);
>>> +            modify_prot_commit_ptes(vma, addr, pte, oldpte, ptent, nr_ptes);
>>>               if (pte_needs_flush(oldpte, ptent))
>>>                   tlb_flush_pte_range(tlb, addr, PAGE_SIZE);
>>>               pages++;



  reply	other threads:[~2025-06-30 10:35 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-28 11:34 [PATCH v4 0/4] Optimize mprotect() for large folios Dev Jain
2025-06-28 11:34 ` [PATCH v4 1/4] mm: Optimize mprotect() for MM_CP_PROT_NUMA by batch-skipping PTEs Dev Jain
2025-06-30  9:42   ` Ryan Roberts
2025-06-30  9:49     ` Dev Jain
2025-06-30  9:55       ` Ryan Roberts
2025-06-30 10:05         ` Dev Jain
2025-06-30 11:25   ` Lorenzo Stoakes
2025-06-30 11:39     ` Ryan Roberts
2025-06-30 11:53       ` Lorenzo Stoakes
2025-06-30 11:40     ` Dev Jain
2025-06-30 11:51       ` Lorenzo Stoakes
2025-06-30 11:56         ` Dev Jain
2025-07-02  9:37   ` Lorenzo Stoakes
2025-07-02 15:01     ` Dev Jain
2025-07-02 15:37       ` Lorenzo Stoakes
2025-06-28 11:34 ` [PATCH v4 2/4] mm: Add batched versions of ptep_modify_prot_start/commit Dev Jain
2025-06-30 10:10   ` Ryan Roberts
2025-06-30 10:17     ` Dev Jain
2025-06-30 10:35       ` Ryan Roberts [this message]
2025-06-30 10:42         ` Dev Jain
2025-06-30 12:57   ` Lorenzo Stoakes
2025-07-01  4:44     ` Dev Jain
2025-07-01  7:33       ` Ryan Roberts
2025-07-01  8:06         ` Lorenzo Stoakes
2025-07-01  8:23           ` Ryan Roberts
2025-07-01  8:34             ` Lorenzo Stoakes
2025-06-28 11:34 ` [PATCH v4 3/4] mm: Optimize mprotect() by PTE-batching Dev Jain
2025-06-28 12:39   ` Dev Jain
2025-06-30 10:31   ` Ryan Roberts
2025-06-30 11:21     ` Dev Jain
2025-06-30 11:47       ` Dev Jain
2025-06-30 11:50       ` Ryan Roberts
2025-06-30 11:53         ` Dev Jain
2025-07-01  5:47     ` Dev Jain
2025-07-01  7:39       ` Ryan Roberts
2025-06-30 12:52   ` Lorenzo Stoakes
2025-07-01  5:30     ` Dev Jain
     [not found]     ` <ec2c3f60-43e9-47d9-9058-49d608845200@arm.com>
2025-07-01  8:06       ` Dev Jain
2025-07-01  8:24         ` Ryan Roberts
2025-07-01  8:15       ` Lorenzo Stoakes
2025-07-01  8:30         ` Ryan Roberts
2025-07-01  8:51           ` Lorenzo Stoakes
2025-07-01  9:53             ` Ryan Roberts
2025-07-01 10:21               ` Lorenzo Stoakes
2025-07-01 11:31                 ` Ryan Roberts
2025-07-01 13:40                   ` Lorenzo Stoakes
2025-07-02 10:32                     ` Lorenzo Stoakes
2025-07-02 15:03                       ` Dev Jain
2025-07-02 15:22                         ` Lorenzo Stoakes
2025-07-03 12:59                           ` David Hildenbrand
2025-06-28 11:34 ` [PATCH v4 4/4] arm64: Add batched versions of ptep_modify_prot_start/commit Dev Jain
2025-06-30 10:43   ` Ryan Roberts
2025-06-29 23:05 ` [PATCH v4 0/4] Optimize mprotect() for large folios Andrew Morton
2025-06-30  3:33   ` Dev Jain
2025-06-30 10:45     ` Ryan Roberts
2025-06-30 11:22       ` Dev Jain
2025-06-30 11:17 ` Lorenzo Stoakes
2025-06-30 11:25   ` Dev Jain
2025-06-30 11:27 ` Lorenzo Stoakes
2025-06-30 11:43   ` Dev Jain
2025-07-01  0:08     ` Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ced934c3-a1ea-4d1c-954a-613eb20a9105@arm.com \
    --to=ryan.roberts@arm.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=anshuman.khandual@arm.com \
    --cc=baohua@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=christophe.leroy@csgroup.eu \
    --cc=david@redhat.com \
    --cc=dev.jain@arm.com \
    --cc=hughd@google.com \
    --cc=ioworker0@gmail.com \
    --cc=jannh@google.com \
    --cc=joey.gouly@arm.com \
    --cc=kevin.brodsky@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=peterx@redhat.com \
    --cc=quic_zhenhuah@quicinc.com \
    --cc=vbabka@suse.cz \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=yang@os.amperecomputing.com \
    --cc=yangyicong@hisilicon.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox