Re: [RFC PATCH 1/3] mm: Introduce can_pte_batch_count() for PTEs batch optimization.

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: David Hildenbrand <david@redhat.com>
To: Zhang Qilong <zhangqilong3@huawei.com>,
	akpm@linux-foundation.org, lorenzo.stoakes@oracle.com,
	Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org,
	surenb@google.com, mhocko@suse.com, jannh@google.com,
	pfalcato@suse.de
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	wangkefeng.wang@huawei.com, sunnanyong@huawei.com
Subject: Re: [RFC PATCH 1/3] mm: Introduce can_pte_batch_count() for PTEs batch optimization.
Date: Mon, 27 Oct 2025 20:24:40 +0100	[thread overview]
Message-ID: <276b70aa-9853-40cc-8e7d-e790166706b5@redhat.com> (raw)
In-Reply-To: <20251027140315.907864-2-zhangqilong3@huawei.com>

On 27.10.25 15:03, Zhang Qilong wrote:
> Currently, the PTEs batch requires folio access, with the maximum
> quantity limited to the PFNs contained within the folio. However,
> in certain case (such as mremap_folio_pte_batch and mincore_pte_range),
> accessing the folio is unnecessary and expensive.
> 
> For scenarios that do not require folio access, this patch introduces
> can_pte_batch_count(). With contiguous physical addresses and identical
> PTE attribut bits, we can now process more page table entries at once,
> in batch, not just limited to entries mapped within a single folio. On
> the other hand, it avoid the folio access.
> 
> Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com>
> ---
>   mm/internal.h | 76 +++++++++++++++++++++++++++++++++++++++------------
>   1 file changed, 58 insertions(+), 18 deletions(-)
> 
> diff --git a/mm/internal.h b/mm/internal.h
> index 1561fc2ff5b8..92034ca9092d 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -233,61 +233,62 @@ static inline pte_t __pte_batch_clear_ignored(pte_t pte, fpb_t flags)
>   		pte = pte_wrprotect(pte);
>   	return pte_mkold(pte);
>   }
>   
>   /**
> - * folio_pte_batch_flags - detect a PTE batch for a large folio
> - * @folio: The large folio to detect a PTE batch for.
> + * can_pte_batch_count - detect a PTE batch in range [ptep, to ptep + max_nr)

I really don't like the name.

Maybe it's just pte_batch().

>    * @vma: The VMA. Only relevant with FPB_MERGE_WRITE, otherwise can be NULL.
>    * @ptep: Page table pointer for the first entry.
>    * @ptentp: Pointer to a COPY of the first page table entry whose flags this
>    *	    function updates based on @flags if appropriate.
>    * @max_nr: The maximum number of table entries to consider.
>    * @flags: Flags to modify the PTE batch semantics.
>    *
> - * Detect a PTE batch: consecutive (present) PTEs that map consecutive
> - * pages of the same large folio in a single VMA and a single page table.
> + * This interface is designed for this case that do not require folio access.
> + * If folio consideration is needed, please call folio_pte_batch_flags instead.
> + *
> + * Detect a PTE batch: consecutive (present) PTEs that map consecutive pages
> + * in a single VMA and a single page table.
>    *
>    * All PTEs inside a PTE batch have the same PTE bits set, excluding the PFN,
>    * the accessed bit, writable bit, dirty bit (unless FPB_RESPECT_DIRTY is set)
>    * and soft-dirty bit (unless FPB_RESPECT_SOFT_DIRTY is set).
>    *
> - * @ptep must map any page of the folio. max_nr must be at least one and
> + * @ptep point to the first entry in range, max_nr must be at least one and
>    * must be limited by the caller so scanning cannot exceed a single VMA and
>    * a single page table.
>    *
>    * Depending on the FPB_MERGE_* flags, the pte stored at @ptentp will
>    * be updated: it's crucial that a pointer to a COPY of the first
>    * page table entry, obtained through ptep_get(), is provided as @ptentp.
>    *
> - * This function will be inlined to optimize based on the input parameters;
> - * consider using folio_pte_batch() instead if applicable.
> + * The following folio_pte_batch_flags() deal with PTEs that mapped in a
> + * single folio. However can_pte_batch_count has the capability to handle
> + * PTEs that mapped in consecutive folios. If flags is not set, it will ignore
> + * the accessed, writable and dirty bits. Once the flags is set, the respect
> + * bit(s) will be compared in pte_same(), if the advanced pte_batch_hint()
> + * respect pte bit is different, pte_same() will return false and break. This
> + * ensures the correctness of handling multiple folio PTEs.
> + *
> + * This function will be inlined to optimize based on the input parameters.
>    *
>    * Return: the number of table entries in the batch.
>    */

I recall trouble if we try batching across folios:

commit 7b08b74f3d99f6b801250683c751d391128799ec (tag: mm-hotfixes-stable-2025-05-10-14-23)
Author: Petr Vaněk <arkamar@atlas.cz>
Date:   Fri May 2 23:50:19 2025 +0200

     mm: fix folio_pte_batch() on XEN PV
     
     On XEN PV, folio_pte_batch() can incorrectly batch beyond the end of a
     folio due to a corner case in pte_advance_pfn().  Specifically, when the
     PFN following the folio maps to an invalidated MFN,
     
             expected_pte = pte_advance_pfn(expected_pte, nr);
     
     produces a pte_none().  If the actual next PTE in memory is also
     pte_none(), the pte_same() succeeds,
     
             if (!pte_same(pte, expected_pte))
                     break;
     
     the loop is not broken, and batching continues into unrelated memory.
     
...


-- 
Cheers

David / dhildenb

next prev parent reply	other threads:[~2025-10-27 19:24 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-27 14:03 [RFC PATCH 0/3] mm: PTEs batch optimization in mincore and mremap Zhang Qilong
2025-10-27 14:03 ` [RFC PATCH 1/3] mm: Introduce can_pte_batch_count() for PTEs batch optimization Zhang Qilong
2025-10-27 19:24   ` David Hildenbrand [this message]
2025-10-27 19:51     ` Lorenzo Stoakes
2025-10-27 20:21       ` Ryan Roberts
2025-10-27 14:03 ` [RFC PATCH 2/3] mm/mincore: Use can_pte_batch_count() in mincore_pte_range() for pte batch mincore_pte_range() Zhang Qilong
2025-10-27 19:27   ` David Hildenbrand
2025-10-27 19:34   ` Lorenzo Stoakes
2025-10-27 14:03 ` [RFC PATCH 3/3] mm/mremap: Use can_pte_batch_count() instead of folio_pte_batch() for pte batch Zhang Qilong
2025-10-27 19:41   ` David Hildenbrand
2025-10-27 19:57   ` Lorenzo Stoakes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=276b70aa-9853-40cc-8e7d-e790166706b5@redhat.com \
    --to=david@redhat.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=jannh@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mhocko@suse.com \
    --cc=pfalcato@suse.de \
    --cc=rppt@kernel.org \
    --cc=sunnanyong@huawei.com \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    --cc=wangkefeng.wang@huawei.com \
    --cc=zhangqilong3@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox