From: Zhang Qilong <zhangqilong3@huawei.com>
To: <akpm@linux-foundation.org>, <david@redhat.com>,
<lorenzo.stoakes@oracle.com>, <Liam.Howlett@oracle.com>,
<vbabka@suse.cz>, <rppt@kernel.org>, <surenb@google.com>,
<mhocko@suse.com>, <jannh@google.com>, <pfalcato@suse.de>
Cc: <linux-mm@kvack.org>, <linux-kernel@vger.kernel.org>,
<wangkefeng.wang@huawei.com>, <sunnanyong@huawei.com>
Subject: [RFC PATCH 1/3] mm: Introduce can_pte_batch_count() for PTEs batch optimization.
Date: Mon, 27 Oct 2025 22:03:13 +0800 [thread overview]
Message-ID: <20251027140315.907864-2-zhangqilong3@huawei.com> (raw)
In-Reply-To: <20251027140315.907864-1-zhangqilong3@huawei.com>
Currently, the PTEs batch requires folio access, with the maximum
quantity limited to the PFNs contained within the folio. However,
in certain case (such as mremap_folio_pte_batch and mincore_pte_range),
accessing the folio is unnecessary and expensive.
For scenarios that do not require folio access, this patch introduces
can_pte_batch_count(). With contiguous physical addresses and identical
PTE attribut bits, we can now process more page table entries at once,
in batch, not just limited to entries mapped within a single folio. On
the other hand, it avoid the folio access.
Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com>
---
mm/internal.h | 76 +++++++++++++++++++++++++++++++++++++++------------
1 file changed, 58 insertions(+), 18 deletions(-)
diff --git a/mm/internal.h b/mm/internal.h
index 1561fc2ff5b8..92034ca9092d 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -233,61 +233,62 @@ static inline pte_t __pte_batch_clear_ignored(pte_t pte, fpb_t flags)
pte = pte_wrprotect(pte);
return pte_mkold(pte);
}
/**
- * folio_pte_batch_flags - detect a PTE batch for a large folio
- * @folio: The large folio to detect a PTE batch for.
+ * can_pte_batch_count - detect a PTE batch in range [ptep, to ptep + max_nr)
* @vma: The VMA. Only relevant with FPB_MERGE_WRITE, otherwise can be NULL.
* @ptep: Page table pointer for the first entry.
* @ptentp: Pointer to a COPY of the first page table entry whose flags this
* function updates based on @flags if appropriate.
* @max_nr: The maximum number of table entries to consider.
* @flags: Flags to modify the PTE batch semantics.
*
- * Detect a PTE batch: consecutive (present) PTEs that map consecutive
- * pages of the same large folio in a single VMA and a single page table.
+ * This interface is designed for this case that do not require folio access.
+ * If folio consideration is needed, please call folio_pte_batch_flags instead.
+ *
+ * Detect a PTE batch: consecutive (present) PTEs that map consecutive pages
+ * in a single VMA and a single page table.
*
* All PTEs inside a PTE batch have the same PTE bits set, excluding the PFN,
* the accessed bit, writable bit, dirty bit (unless FPB_RESPECT_DIRTY is set)
* and soft-dirty bit (unless FPB_RESPECT_SOFT_DIRTY is set).
*
- * @ptep must map any page of the folio. max_nr must be at least one and
+ * @ptep point to the first entry in range, max_nr must be at least one and
* must be limited by the caller so scanning cannot exceed a single VMA and
* a single page table.
*
* Depending on the FPB_MERGE_* flags, the pte stored at @ptentp will
* be updated: it's crucial that a pointer to a COPY of the first
* page table entry, obtained through ptep_get(), is provided as @ptentp.
*
- * This function will be inlined to optimize based on the input parameters;
- * consider using folio_pte_batch() instead if applicable.
+ * The following folio_pte_batch_flags() deal with PTEs that mapped in a
+ * single folio. However can_pte_batch_count has the capability to handle
+ * PTEs that mapped in consecutive folios. If flags is not set, it will ignore
+ * the accessed, writable and dirty bits. Once the flags is set, the respect
+ * bit(s) will be compared in pte_same(), if the advanced pte_batch_hint()
+ * respect pte bit is different, pte_same() will return false and break. This
+ * ensures the correctness of handling multiple folio PTEs.
+ *
+ * This function will be inlined to optimize based on the input parameters.
*
* Return: the number of table entries in the batch.
*/
-static inline unsigned int folio_pte_batch_flags(struct folio *folio,
- struct vm_area_struct *vma, pte_t *ptep, pte_t *ptentp,
- unsigned int max_nr, fpb_t flags)
+static inline unsigned int can_pte_batch_count(struct vm_area_struct *vma,
+ pte_t *ptep, pte_t *ptentp, unsigned int max_nr, fpb_t flags)
{
bool any_writable = false, any_young = false, any_dirty = false;
pte_t expected_pte, pte = *ptentp;
unsigned int nr, cur_nr;
- VM_WARN_ON_FOLIO(!pte_present(pte), folio);
- VM_WARN_ON_FOLIO(!folio_test_large(folio) || max_nr < 1, folio);
- VM_WARN_ON_FOLIO(page_folio(pfn_to_page(pte_pfn(pte))) != folio, folio);
+ VM_WARN_ON(!pte_present(pte));
/*
* Ensure this is a pointer to a copy not a pointer into a page table.
* If this is a stack value, it won't be a valid virtual address, but
* that's fine because it also cannot be pointing into the page table.
*/
VM_WARN_ON(virt_addr_valid(ptentp) && PageTable(virt_to_page(ptentp)));
-
- /* Limit max_nr to the actual remaining PFNs in the folio we could batch. */
- max_nr = min_t(unsigned long, max_nr,
- folio_pfn(folio) + folio_nr_pages(folio) - pte_pfn(pte));
-
nr = pte_batch_hint(ptep, pte);
expected_pte = __pte_batch_clear_ignored(pte_advance_pfn(pte, nr), flags);
ptep = ptep + nr;
while (nr < max_nr) {
@@ -317,10 +318,49 @@ static inline unsigned int folio_pte_batch_flags(struct folio *folio,
*ptentp = pte_mkdirty(*ptentp);
return min(nr, max_nr);
}
+/**
+ * folio_pte_batch_flags - detect a PTE batch for a large folio
+ * @folio: The large folio to detect a PTE batch for.
+ * @vma: The VMA. Only relevant with FPB_MERGE_WRITE, otherwise can be NULL.
+ * @ptep: Page table pointer for the first entry.
+ * @ptentp: Pointer to a COPY of the first page table entry whose flags this
+ * function updates based on @flags if appropriate.
+ * @max_nr: The maximum number of table entries to consider.
+ * @flags: Flags to modify the PTE batch semantics.
+ *
+ * Detect a PTE batch: consecutive (present) PTEs that map consecutive
+ * pages of the same large folio and have the same PTE bits set excluding the
+ * PFN, the accessed bit, writable bit, dirty bit. (unless FPB_RESPECT_DIRTY
+ * is set) and soft-dirty bit (unless FPB_RESPECT_SOFT_DIRTY is set).
+ *
+ * @ptep must map any page of the folio.
+ *
+ * This function will be inlined to optimize based on the input parameters;
+ * consider using folio_pte_batch() instead if applicable.
+ *
+ * Return: the number of table entries in the batch.
+ */
+static inline unsigned int folio_pte_batch_flags(struct folio *folio,
+ struct vm_area_struct *vma, pte_t *ptep, pte_t *ptentp,
+ unsigned int max_nr, fpb_t flags)
+{
+ pte_t pte = *ptentp;
+
+ VM_WARN_ON_FOLIO(!pte_present(pte), folio);
+ VM_WARN_ON_FOLIO(!folio_test_large(folio) || max_nr < 1, folio);
+ VM_WARN_ON_FOLIO(page_folio(pfn_to_page(pte_pfn(pte))) != folio, folio);
+
+ /* Limit max_nr to the actual remaining PFNs in the folio we could batch. */
+ max_nr = min_t(unsigned long, max_nr,
+ folio_pfn(folio) + folio_nr_pages(folio) - pte_pfn(pte));
+
+ return can_pte_batch_count(vma, ptep, ptentp, max_nr, flags);
+}
+
unsigned int folio_pte_batch(struct folio *folio, pte_t *ptep, pte_t pte,
unsigned int max_nr);
/**
* pte_move_swp_offset - Move the swap entry offset field of a swap pte
--
2.43.0
next prev parent reply other threads:[~2025-10-27 14:03 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-27 14:03 [RFC PATCH 0/3] mm: PTEs batch optimization in mincore and mremap Zhang Qilong
2025-10-27 14:03 ` Zhang Qilong [this message]
2025-10-27 19:24 ` [RFC PATCH 1/3] mm: Introduce can_pte_batch_count() for PTEs batch optimization David Hildenbrand
2025-10-27 19:51 ` Lorenzo Stoakes
2025-10-27 20:21 ` Ryan Roberts
2025-10-27 14:03 ` [RFC PATCH 2/3] mm/mincore: Use can_pte_batch_count() in mincore_pte_range() for pte batch mincore_pte_range() Zhang Qilong
2025-10-27 19:27 ` David Hildenbrand
2025-10-27 19:34 ` Lorenzo Stoakes
2025-10-27 14:03 ` [RFC PATCH 3/3] mm/mremap: Use can_pte_batch_count() instead of folio_pte_batch() for pte batch Zhang Qilong
2025-10-27 19:41 ` David Hildenbrand
2025-10-27 19:57 ` Lorenzo Stoakes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251027140315.907864-2-zhangqilong3@huawei.com \
--to=zhangqilong3@huawei.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=david@redhat.com \
--cc=jannh@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mhocko@suse.com \
--cc=pfalcato@suse.de \
--cc=rppt@kernel.org \
--cc=sunnanyong@huawei.com \
--cc=surenb@google.com \
--cc=vbabka@suse.cz \
--cc=wangkefeng.wang@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox