linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Dev Jain <dev.jain@arm.com>, akpm@linux-foundation.org
Cc: Liam.Howlett@oracle.com, lorenzo.stoakes@oracle.com,
	vbabka@suse.cz, jannh@google.com, pfalcato@suse.de,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	peterx@redhat.com, ryan.roberts@arm.com, mingo@kernel.org,
	libang.li@antgroup.com, maobibo@loongson.cn,
	zhengqi.arch@bytedance.com, baohua@kernel.org,
	anshuman.khandual@arm.com, willy@infradead.org,
	ioworker0@gmail.com, yang@os.amperecomputing.com
Subject: Re: [PATCH 2/3] mm: Add generic helper to hint a large folio
Date: Thu, 8 May 2025 12:55:48 +0200	[thread overview]
Message-ID: <0979ce4e-d316-477c-872e-d3f9e47690e5@redhat.com> (raw)
In-Reply-To: <b104b843-f12a-4382-a05f-53e2e35bdcb0@arm.com>


>> (2) Do we really need "must be part of the same folio", or could be just
>> batch over present
>> ptes that map consecutive PFNs? In that case, a helper that avoids
>> folio_pte_batch() completely
>> might be better.
>>
> I am not sure I get you here. folio_pte_batch() seems to be the simplest
> thing we can do as being done around in the code elsewhere, I am not
> aware of any alternate.

If we don't need the folio, then we can have a batching function that
doesn't require the folio.

Likely, we could even factor that (non-folio batching) out from folio_pte_batch().
The recent fix [1] might make that easier. See below.


So my question is: is something relying on all of these PTEs to point at the same folio?

[1] https://lkml.kernel.org/r/20250502215019.822-2-arkamar@atlas.cz


Something like this: (would need kerneldoc, probably remove "addr" parameter from folio_pte_batch(),
and look into other related cleanups as discussed with Andrew)


 From f56f67ee5ae9879adb99a8da37fa7ec848c4d256 Mon Sep 17 00:00:00 2001
From: David Hildenbrand <david@redhat.com>
Date: Thu, 8 May 2025 12:53:52 +0200
Subject: [PATCH] tmp

Signed-off-by: David Hildenbrand <david@redhat.com>
---
  mm/internal.h | 84 ++++++++++++++++++++++++++++-----------------------
  1 file changed, 46 insertions(+), 38 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index 25a29872c634b..53ff8f8a7c8f9 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -217,36 +217,8 @@ static inline pte_t __pte_batch_clear_ignored(pte_t pte, fpb_t flags)
  	return pte_wrprotect(pte_mkold(pte));
  }
  
-/**
- * folio_pte_batch - detect a PTE batch for a large folio
- * @folio: The large folio to detect a PTE batch for.
- * @addr: The user virtual address the first page is mapped at.
- * @start_ptep: Page table pointer for the first entry.
- * @pte: Page table entry for the first page.
- * @max_nr: The maximum number of table entries to consider.
- * @flags: Flags to modify the PTE batch semantics.
- * @any_writable: Optional pointer to indicate whether any entry except the
- *		  first one is writable.
- * @any_young: Optional pointer to indicate whether any entry except the
- *		  first one is young.
- * @any_dirty: Optional pointer to indicate whether any entry except the
- *		  first one is dirty.
- *
- * Detect a PTE batch: consecutive (present) PTEs that map consecutive
- * pages of the same large folio.
- *
- * All PTEs inside a PTE batch have the same PTE bits set, excluding the PFN,
- * the accessed bit, writable bit, dirty bit (with FPB_IGNORE_DIRTY) and
- * soft-dirty bit (with FPB_IGNORE_SOFT_DIRTY).
- *
- * start_ptep must map any page of the folio. max_nr must be at least one and
- * must be limited by the caller so scanning cannot exceed a single page table.
- *
- * Return: the number of table entries in the batch.
- */
-static inline int folio_pte_batch(struct folio *folio, unsigned long addr,
-		pte_t *start_ptep, pte_t pte, int max_nr, fpb_t flags,
-		bool *any_writable, bool *any_young, bool *any_dirty)
+static inline int pte_batch(pte_t *start_ptep, pte_t pte, int max_nr,
+		fpb_t flags, bool *any_writable, bool *any_young, bool *any_dirty)
  {
  	pte_t expected_pte, *ptep;
  	bool writable, young, dirty;
@@ -259,14 +231,6 @@ static inline int folio_pte_batch(struct folio *folio, unsigned long addr,
  	if (any_dirty)
  		*any_dirty = false;
  
-	VM_WARN_ON_FOLIO(!pte_present(pte), folio);
-	VM_WARN_ON_FOLIO(!folio_test_large(folio) || max_nr < 1, folio);
-	VM_WARN_ON_FOLIO(page_folio(pfn_to_page(pte_pfn(pte))) != folio, folio);
-
-	/* Limit max_nr to the actual remaining PFNs in the folio we could batch. */
-	max_nr = min_t(unsigned long, max_nr,
-		       folio_pfn(folio) + folio_nr_pages(folio) - pte_pfn(pte));
-
  	nr = pte_batch_hint(start_ptep, pte);
  	expected_pte = __pte_batch_clear_ignored(pte_advance_pfn(pte, nr), flags);
  	ptep = start_ptep + nr;
@@ -300,6 +264,50 @@ static inline int folio_pte_batch(struct folio *folio, unsigned long addr,
  	return min(nr, max_nr);
  }
  
+/**
+ * folio_pte_batch - detect a PTE batch for a large folio
+ * @folio: The large folio to detect a PTE batch for.
+ * @addr: The user virtual address the first page is mapped at.
+ * @start_ptep: Page table pointer for the first entry.
+ * @pte: Page table entry for the first page.
+ * @max_nr: The maximum number of table entries to consider.
+ * @flags: Flags to modify the PTE batch semantics.
+ * @any_writable: Optional pointer to indicate whether any entry except the
+ *		  first one is writable.
+ * @any_young: Optional pointer to indicate whether any entry except the
+ *		  first one is young.
+ * @any_dirty: Optional pointer to indicate whether any entry except the
+ *		  first one is dirty.
+ *
+ * Detect a PTE batch: consecutive (present) PTEs that map consecutive
+ * pages of the same large folio.
+ *
+ * All PTEs inside a PTE batch have the same PTE bits set, excluding the PFN,
+ * the accessed bit, writable bit, dirty bit (with FPB_IGNORE_DIRTY) and
+ * soft-dirty bit (with FPB_IGNORE_SOFT_DIRTY).
+ *
+ * start_ptep must map any page of the folio. max_nr must be at least one and
+ * must be limited by the caller so scanning cannot exceed a single page table.
+ *
+ * Return: the number of table entries in the batch.
+ */
+static inline int folio_pte_batch(struct folio *folio, unsigned long addr,
+		pte_t *start_ptep, pte_t pte, int max_nr, fpb_t flags,
+		bool *any_writable, bool *any_young, bool *any_dirty)
+{
+
+	VM_WARN_ON_FOLIO(!pte_present(pte), folio);
+	VM_WARN_ON_FOLIO(!folio_test_large(folio) || max_nr < 1, folio);
+	VM_WARN_ON_FOLIO(page_folio(pfn_to_page(pte_pfn(pte))) != folio, folio);
+
+	/* Limit max_nr to the actual remaining PFNs in the folio we could batch. */
+	max_nr = min_t(unsigned long, max_nr,
+		       folio_pfn(folio) + folio_nr_pages(folio) - pte_pfn(pte));
+
+	return pte_batch(start_ptep, pte, max_nr, flags, any_writable, any_young,
+			 any_dirty);
+}
+
  /**
   * pte_move_swp_offset - Move the swap entry offset field of a swap pte
   *	 forward or backward by delta
-- 
2.49.0


-- 
Cheers,

David / dhildenb



  reply	other threads:[~2025-05-08 10:55 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-06  5:00 [PATCH 0/3] Optimize mremap() by PTE-batching Dev Jain
2025-05-06  5:00 ` [PATCH 1/3] mm: Call pointers to ptes as ptep Dev Jain
2025-05-06  8:50   ` Anshuman Khandual
2025-05-06  9:05     ` Lorenzo Stoakes
2025-05-06 10:52   ` Lorenzo Stoakes
2025-05-06 11:52     ` Dev Jain
2025-05-06  5:00 ` [PATCH 2/3] mm: Add generic helper to hint a large folio Dev Jain
2025-05-06  9:10   ` Anshuman Khandual
2025-05-06 13:34     ` Lorenzo Stoakes
2025-05-06 15:46   ` Matthew Wilcox
2025-05-07  3:43     ` Dev Jain
2025-05-07 10:03   ` David Hildenbrand
2025-05-08  5:02     ` Dev Jain
2025-05-08 10:55       ` David Hildenbrand [this message]
2025-05-09  5:25         ` Dev Jain
2025-05-09  9:16           ` David Hildenbrand
2025-05-06  5:00 ` [PATCH 3/3] mm: Optimize mremap() by PTE batching Dev Jain
2025-05-06 10:10   ` Anshuman Khandual
2025-05-06 10:20     ` Dev Jain
2025-05-06 13:49   ` Lorenzo Stoakes
2025-05-06 14:03     ` Lorenzo Stoakes
2025-05-06 14:10     ` Dev Jain
2025-05-06 14:14       ` Lorenzo Stoakes
2025-05-06  9:16 ` [PATCH 0/3] Optimize mremap() by PTE-batching Anshuman Khandual
2025-05-06 10:22   ` Dev Jain
2025-05-06 10:44     ` Lorenzo Stoakes
2025-05-06 11:53       ` Dev Jain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0979ce4e-d316-477c-872e-d3f9e47690e5@redhat.com \
    --to=david@redhat.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=anshuman.khandual@arm.com \
    --cc=baohua@kernel.org \
    --cc=dev.jain@arm.com \
    --cc=ioworker0@gmail.com \
    --cc=jannh@google.com \
    --cc=libang.li@antgroup.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=maobibo@loongson.cn \
    --cc=mingo@kernel.org \
    --cc=peterx@redhat.com \
    --cc=pfalcato@suse.de \
    --cc=ryan.roberts@arm.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    --cc=yang@os.amperecomputing.com \
    --cc=zhengqi.arch@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox