From: David Hildenbrand <david@redhat.com>
To: Zhang Qilong <zhangqilong3@huawei.com>,
akpm@linux-foundation.org, lorenzo.stoakes@oracle.com,
Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org,
surenb@google.com, mhocko@suse.com, jannh@google.com,
pfalcato@suse.de
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
wangkefeng.wang@huawei.com, sunnanyong@huawei.com
Subject: Re: [RFC PATCH 1/3] mm: Introduce can_pte_batch_count() for PTEs batch optimization.
Date: Mon, 27 Oct 2025 20:24:40 +0100 [thread overview]
Message-ID: <276b70aa-9853-40cc-8e7d-e790166706b5@redhat.com> (raw)
In-Reply-To: <20251027140315.907864-2-zhangqilong3@huawei.com>
On 27.10.25 15:03, Zhang Qilong wrote:
> Currently, the PTEs batch requires folio access, with the maximum
> quantity limited to the PFNs contained within the folio. However,
> in certain case (such as mremap_folio_pte_batch and mincore_pte_range),
> accessing the folio is unnecessary and expensive.
>
> For scenarios that do not require folio access, this patch introduces
> can_pte_batch_count(). With contiguous physical addresses and identical
> PTE attribut bits, we can now process more page table entries at once,
> in batch, not just limited to entries mapped within a single folio. On
> the other hand, it avoid the folio access.
>
> Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com>
> ---
> mm/internal.h | 76 +++++++++++++++++++++++++++++++++++++++------------
> 1 file changed, 58 insertions(+), 18 deletions(-)
>
> diff --git a/mm/internal.h b/mm/internal.h
> index 1561fc2ff5b8..92034ca9092d 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -233,61 +233,62 @@ static inline pte_t __pte_batch_clear_ignored(pte_t pte, fpb_t flags)
> pte = pte_wrprotect(pte);
> return pte_mkold(pte);
> }
>
> /**
> - * folio_pte_batch_flags - detect a PTE batch for a large folio
> - * @folio: The large folio to detect a PTE batch for.
> + * can_pte_batch_count - detect a PTE batch in range [ptep, to ptep + max_nr)
I really don't like the name.
Maybe it's just pte_batch().
> * @vma: The VMA. Only relevant with FPB_MERGE_WRITE, otherwise can be NULL.
> * @ptep: Page table pointer for the first entry.
> * @ptentp: Pointer to a COPY of the first page table entry whose flags this
> * function updates based on @flags if appropriate.
> * @max_nr: The maximum number of table entries to consider.
> * @flags: Flags to modify the PTE batch semantics.
> *
> - * Detect a PTE batch: consecutive (present) PTEs that map consecutive
> - * pages of the same large folio in a single VMA and a single page table.
> + * This interface is designed for this case that do not require folio access.
> + * If folio consideration is needed, please call folio_pte_batch_flags instead.
> + *
> + * Detect a PTE batch: consecutive (present) PTEs that map consecutive pages
> + * in a single VMA and a single page table.
> *
> * All PTEs inside a PTE batch have the same PTE bits set, excluding the PFN,
> * the accessed bit, writable bit, dirty bit (unless FPB_RESPECT_DIRTY is set)
> * and soft-dirty bit (unless FPB_RESPECT_SOFT_DIRTY is set).
> *
> - * @ptep must map any page of the folio. max_nr must be at least one and
> + * @ptep point to the first entry in range, max_nr must be at least one and
> * must be limited by the caller so scanning cannot exceed a single VMA and
> * a single page table.
> *
> * Depending on the FPB_MERGE_* flags, the pte stored at @ptentp will
> * be updated: it's crucial that a pointer to a COPY of the first
> * page table entry, obtained through ptep_get(), is provided as @ptentp.
> *
> - * This function will be inlined to optimize based on the input parameters;
> - * consider using folio_pte_batch() instead if applicable.
> + * The following folio_pte_batch_flags() deal with PTEs that mapped in a
> + * single folio. However can_pte_batch_count has the capability to handle
> + * PTEs that mapped in consecutive folios. If flags is not set, it will ignore
> + * the accessed, writable and dirty bits. Once the flags is set, the respect
> + * bit(s) will be compared in pte_same(), if the advanced pte_batch_hint()
> + * respect pte bit is different, pte_same() will return false and break. This
> + * ensures the correctness of handling multiple folio PTEs.
> + *
> + * This function will be inlined to optimize based on the input parameters.
> *
> * Return: the number of table entries in the batch.
> */
I recall trouble if we try batching across folios:
commit 7b08b74f3d99f6b801250683c751d391128799ec (tag: mm-hotfixes-stable-2025-05-10-14-23)
Author: Petr Vaněk <arkamar@atlas.cz>
Date: Fri May 2 23:50:19 2025 +0200
mm: fix folio_pte_batch() on XEN PV
On XEN PV, folio_pte_batch() can incorrectly batch beyond the end of a
folio due to a corner case in pte_advance_pfn(). Specifically, when the
PFN following the folio maps to an invalidated MFN,
expected_pte = pte_advance_pfn(expected_pte, nr);
produces a pte_none(). If the actual next PTE in memory is also
pte_none(), the pte_same() succeeds,
if (!pte_same(pte, expected_pte))
break;
the loop is not broken, and batching continues into unrelated memory.
...
--
Cheers
David / dhildenb
next prev parent reply other threads:[~2025-10-27 19:24 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-27 14:03 [RFC PATCH 0/3] mm: PTEs batch optimization in mincore and mremap Zhang Qilong
2025-10-27 14:03 ` [RFC PATCH 1/3] mm: Introduce can_pte_batch_count() for PTEs batch optimization Zhang Qilong
2025-10-27 19:24 ` David Hildenbrand [this message]
2025-10-27 19:51 ` Lorenzo Stoakes
2025-10-27 20:21 ` Ryan Roberts
2025-10-27 14:03 ` [RFC PATCH 2/3] mm/mincore: Use can_pte_batch_count() in mincore_pte_range() for pte batch mincore_pte_range() Zhang Qilong
2025-10-27 19:27 ` David Hildenbrand
2025-10-27 19:34 ` Lorenzo Stoakes
2025-10-27 14:03 ` [RFC PATCH 3/3] mm/mremap: Use can_pte_batch_count() instead of folio_pte_batch() for pte batch Zhang Qilong
2025-10-27 19:41 ` David Hildenbrand
2025-10-27 19:57 ` Lorenzo Stoakes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=276b70aa-9853-40cc-8e7d-e790166706b5@redhat.com \
--to=david@redhat.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=jannh@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mhocko@suse.com \
--cc=pfalcato@suse.de \
--cc=rppt@kernel.org \
--cc=sunnanyong@huawei.com \
--cc=surenb@google.com \
--cc=vbabka@suse.cz \
--cc=wangkefeng.wang@huawei.com \
--cc=zhangqilong3@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox