linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 1/2] mm/pgtable: use ptdesc for pmd_huge_pte
@ 2025-12-14  6:55 alexs
  2025-12-14  6:55 ` [RFC PATCH 2/2] mm/pgtable: convert pgtable_trans_huge_withdraw to ptdesc alexs
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: alexs @ 2025-12-14  6:55 UTC (permalink / raw)
  To: Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Alexander Gordeev, Gerald Schaefer,
	Heiko Carstens, Vasily Gorbik, Christian Borntraeger,
	Sven Schnelle, David S . Miller, Andreas Larsson, Andrew Morton,
	David Hildenbrand, Lorenzo Stoakes, Liam R . Howlett,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Zi Yan, Baolin Wang, Nico Pache, Ryan Roberts, Dev Jain,
	Barry Song, Lance Yang, Matthew Brost, Joshua Hahn, Rakie Kim,
	Byungchul Park, Gregory Price, Ying Huang, Alistair Popple,
	Thomas Huth, Will Deacon, Matthew Wilcox, Magnus Lindholm,
	linuxppc-dev, linux-s390, sparclinux, linux-mm
  Cc: Alex Shi

From: Alex Shi <alexs@kernel.org>

'pmd_huge_pte' are pgtable variables, but used 'pgtable->lru'
instead of pgtable->pt_list in pgtable_trans_huge_deposit/withdraw
functions, That's a bit weird.

So let's convert the pgtable_t to precise 'struct ptdesc *' for
ptdesc->pmd_huge_pte, and mm->pmd_huge_pte, then convert function
pgtable_trans_huge_deposit() to use correct ptdesc.

This convertion works for most of arch, but failed on s390/sparc/powerpc
since they use 'pte_t *' as pgtable_t. Is there any suggestion for these
archs? If we could have a solution, we may remove the pgtable_t for other
archs.

Signed-off-by: Alex Shi <alexs@kernel.org>
Cc: linux-mm@kvack.org
Cc: sparclinux@vger.kernel.org
Cc: linux-s390@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Magnus Lindholm <linmag7@gmail.com>
Cc: Matthew Wilcox  <willy@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: Thomas Huth <thuth@redhat.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Ying Huang <ying.huang@linux.alibaba.com>
Cc: Gregory Price <gourry@gourry.net>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Barry Song <baohua@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Christophe Leroy  <chleroy@kernel.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
---
 arch/powerpc/include/asm/book3s/64/pgtable.h |  6 +++---
 arch/s390/include/asm/pgtable.h              |  2 +-
 arch/s390/mm/pgtable.c                       |  2 +-
 arch/sparc/include/asm/pgtable_64.h          |  2 +-
 arch/sparc/mm/tlb.c                          |  2 +-
 include/linux/mm_types.h                     |  4 ++--
 include/linux/pgtable.h                      |  2 +-
 mm/debug_vm_pgtable.c                        |  3 ++-
 mm/huge_memory.c                             | 16 +++++++++-------
 mm/khugepaged.c                              |  2 +-
 mm/memory.c                                  |  3 ++-
 mm/migrate_device.c                          |  2 +-
 mm/pgtable-generic.c                         | 16 ++++++++--------
 13 files changed, 33 insertions(+), 29 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
index aac8ce30cd3b..f10736af296d 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -1320,11 +1320,11 @@ pud_t pudp_huge_get_and_clear_full(struct vm_area_struct *vma,
 
 #define __HAVE_ARCH_PGTABLE_DEPOSIT
 static inline void pgtable_trans_huge_deposit(struct mm_struct *mm,
-					      pmd_t *pmdp, pgtable_t pgtable)
+					      pmd_t *pmdp, struct ptdesc *pgtable)
 {
 	if (radix_enabled())
-		return radix__pgtable_trans_huge_deposit(mm, pmdp, pgtable);
-	return hash__pgtable_trans_huge_deposit(mm, pmdp, pgtable);
+		return radix__pgtable_trans_huge_deposit(mm, pmdp, page_ptdesc(pgtable));
+	return hash__pgtable_trans_huge_deposit(mm, pmdp, page_ptdesc(pgtable));
 }
 
 #define __HAVE_ARCH_PGTABLE_WITHDRAW
diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index bca9b29778c3..e45cb52a923a 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -1751,7 +1751,7 @@ pud_t pudp_xchg_direct(struct mm_struct *, unsigned long, pud_t *, pud_t);
 
 #define __HAVE_ARCH_PGTABLE_DEPOSIT
 void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp,
-				pgtable_t pgtable);
+				struct ptdesc *pgtable);
 
 #define __HAVE_ARCH_PGTABLE_WITHDRAW
 pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp);
diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
index 666adcd681ab..c301af71b3ec 100644
--- a/arch/s390/mm/pgtable.c
+++ b/arch/s390/mm/pgtable.c
@@ -520,7 +520,7 @@ EXPORT_SYMBOL(pudp_xchg_direct);
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp,
-				pgtable_t pgtable)
+				struct ptdesc *pgtable)
 {
 	struct list_head *lh = (struct list_head *) pgtable;
 
diff --git a/arch/sparc/include/asm/pgtable_64.h b/arch/sparc/include/asm/pgtable_64.h
index 615f460c50af..4b7f7113a1b3 100644
--- a/arch/sparc/include/asm/pgtable_64.h
+++ b/arch/sparc/include/asm/pgtable_64.h
@@ -992,7 +992,7 @@ extern pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
 
 #define __HAVE_ARCH_PGTABLE_DEPOSIT
 void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp,
-				pgtable_t pgtable);
+				struct ptdesc *pgtable);
 
 #define __HAVE_ARCH_PGTABLE_WITHDRAW
 pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp);
diff --git a/arch/sparc/mm/tlb.c b/arch/sparc/mm/tlb.c
index a35ddcca5e76..5dfee57d2440 100644
--- a/arch/sparc/mm/tlb.c
+++ b/arch/sparc/mm/tlb.c
@@ -270,7 +270,7 @@ pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
 }
 
 void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp,
-				pgtable_t pgtable)
+				struct ptdesc *pgtable)
 {
 	struct list_head *lh = (struct list_head *) pgtable;
 
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 9f6de068295d..674e5fd4cf0d 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -577,7 +577,7 @@ struct ptdesc {
 		struct list_head pt_list;
 		struct {
 			unsigned long _pt_pad_1;
-			pgtable_t pmd_huge_pte;
+			struct ptdesc *pmd_huge_pte;
 		};
 	};
 	unsigned long __page_mapping;
@@ -1249,7 +1249,7 @@ struct mm_struct {
 		struct mmu_notifier_subscriptions *notifier_subscriptions;
 #endif
 #if defined(CONFIG_TRANSPARENT_HUGEPAGE) && !defined(CONFIG_SPLIT_PMD_PTLOCKS)
-		pgtable_t pmd_huge_pte; /* protected by page_table_lock */
+		struct ptdesc *pmd_huge_pte; /* protected by page_table_lock */
 #endif
 #ifdef CONFIG_NUMA_BALANCING
 		/*
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index 652f287c1ef6..a5b1e3f7452a 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -1017,7 +1017,7 @@ static inline pmd_t pmdp_collapse_flush(struct vm_area_struct *vma,
 
 #ifndef __HAVE_ARCH_PGTABLE_DEPOSIT
 extern void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp,
-				       pgtable_t pgtable);
+				       struct ptdesc *pgtable);
 #endif
 
 #ifndef __HAVE_ARCH_PGTABLE_WITHDRAW
diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index ae9b9310d96f..26ff92705558 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -240,7 +240,8 @@ static void __init pmd_advanced_tests(struct pgtable_debug_args *args)
 	/* Align the address wrt HPAGE_PMD_SIZE */
 	vaddr &= HPAGE_PMD_MASK;
 
-	pgtable_trans_huge_deposit(args->mm, args->pmdp, args->start_ptep);
+	pgtable_trans_huge_deposit(args->mm, args->pmdp,
+					page_ptdesc(args->start_ptep));
 
 	pmd = pfn_pmd(args->pmd_pfn, args->page_prot);
 	set_pmd_at(args->mm, vaddr, args->pmdp, pmd);
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index f7c565f11a98..ff74bd70690d 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1352,7 +1352,8 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf)
 			VM_BUG_ON(ret & VM_FAULT_FALLBACK);
 			return ret;
 		}
-		pgtable_trans_huge_deposit(vma->vm_mm, vmf->pmd, pgtable);
+		pgtable_trans_huge_deposit(vma->vm_mm, vmf->pmd,
+						page_ptdesc(pgtable));
 		map_anon_folio_pmd_pf(folio, vmf->pmd, vma, haddr);
 		mm_inc_nr_ptes(vma->vm_mm);
 		spin_unlock(vmf->ptl);
@@ -1450,7 +1451,7 @@ static void set_huge_zero_folio(pgtable_t pgtable, struct mm_struct *mm,
 	pmd_t entry;
 	entry = folio_mk_pmd(zero_folio, vma->vm_page_prot);
 	entry = pmd_mkspecial(entry);
-	pgtable_trans_huge_deposit(mm, pmd, pgtable);
+	pgtable_trans_huge_deposit(mm, pmd, page_ptdesc(pgtable));
 	set_pmd_at(mm, haddr, pmd, entry);
 	mm_inc_nr_ptes(mm);
 }
@@ -1576,7 +1577,7 @@ static vm_fault_t insert_pmd(struct vm_area_struct *vma, unsigned long addr,
 	}
 
 	if (pgtable) {
-		pgtable_trans_huge_deposit(mm, pmd, pgtable);
+		pgtable_trans_huge_deposit(mm, pmd, page_ptdesc(pgtable));
 		mm_inc_nr_ptes(mm);
 		pgtable = NULL;
 	}
@@ -1837,7 +1838,7 @@ static void copy_huge_non_present_pmd(
 
 	add_mm_counter(dst_mm, MM_ANONPAGES, HPAGE_PMD_NR);
 	mm_inc_nr_ptes(dst_mm);
-	pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable);
+	pgtable_trans_huge_deposit(dst_mm, dst_pmd, page_ptdesc(pgtable));
 	if (!userfaultfd_wp(dst_vma))
 		pmd = pmd_swp_clear_uffd_wp(pmd);
 	set_pmd_at(dst_mm, addr, dst_pmd, pmd);
@@ -1932,7 +1933,7 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 	add_mm_counter(dst_mm, MM_ANONPAGES, HPAGE_PMD_NR);
 out_zero_page:
 	mm_inc_nr_ptes(dst_mm);
-	pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable);
+	pgtable_trans_huge_deposit(dst_mm, dst_pmd, page_ptdesc(pgtable));
 	pmdp_set_wrprotect(src_mm, addr, src_pmd);
 	if (!userfaultfd_wp(dst_vma))
 		pmd = pmd_clear_uffd_wp(pmd);
@@ -2493,7 +2494,8 @@ bool move_huge_pmd(struct vm_area_struct *vma, unsigned long old_addr,
 		if (pmd_move_must_withdraw(new_ptl, old_ptl, vma)) {
 			pgtable_t pgtable;
 			pgtable = pgtable_trans_huge_withdraw(mm, old_pmd);
-			pgtable_trans_huge_deposit(mm, new_pmd, pgtable);
+			pgtable_trans_huge_deposit(mm, new_pmd,
+							page_ptdesc(pgtable));
 		}
 		pmd = move_soft_dirty_pmd(pmd);
 		if (vma_has_uffd_without_event_remap(vma))
@@ -2799,7 +2801,7 @@ int move_pages_huge_pmd(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd, pm
 	set_pmd_at(mm, dst_addr, dst_pmd, _dst_pmd);
 
 	src_pgtable = pgtable_trans_huge_withdraw(mm, src_pmd);
-	pgtable_trans_huge_deposit(mm, dst_pmd, src_pgtable);
+	pgtable_trans_huge_deposit(mm, dst_pmd, page_ptdesc(src_pgtable));
 unlock_ptls:
 	double_pt_unlock(src_ptl, dst_ptl);
 	/* unblock rmap walks */
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 97d1b2824386..f9b1f8e75360 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1228,7 +1228,7 @@ static int collapse_huge_page(struct mm_struct *mm, unsigned long address,
 
 	spin_lock(pmd_ptl);
 	BUG_ON(!pmd_none(*pmd));
-	pgtable_trans_huge_deposit(mm, pmd, pgtable);
+	pgtable_trans_huge_deposit(mm, pmd, page_ptdesc(pgtable));
 	map_anon_folio_pmd_nopf(folio, pmd, vma, address);
 	spin_unlock(pmd_ptl);
 
diff --git a/mm/memory.c b/mm/memory.c
index 2a55edc48a65..f777de39cede 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -5351,7 +5351,8 @@ static void deposit_prealloc_pte(struct vm_fault *vmf)
 {
 	struct vm_area_struct *vma = vmf->vma;
 
-	pgtable_trans_huge_deposit(vma->vm_mm, vmf->pmd, vmf->prealloc_pte);
+	pgtable_trans_huge_deposit(vma->vm_mm, vmf->pmd,
+					page_ptdesc(vmf->prealloc_pte));
 	/*
 	 * We are going to consume the prealloc table,
 	 * count that as nr_ptes.
diff --git a/mm/migrate_device.c b/mm/migrate_device.c
index 23379663b1e1..dd83bfff4f44 100644
--- a/mm/migrate_device.c
+++ b/mm/migrate_device.c
@@ -883,7 +883,7 @@ static int migrate_vma_insert_huge_pmd_page(struct migrate_vma *migrate,
 		flush_cache_page(vma, addr, addr + HPAGE_PMD_SIZE);
 		pmdp_invalidate(vma, addr, pmdp);
 	} else {
-		pgtable_trans_huge_deposit(vma->vm_mm, pmdp, pgtable);
+		pgtable_trans_huge_deposit(vma->vm_mm, pmdp, page_ptdesc(pgtable));
 		mm_inc_nr_ptes(vma->vm_mm);
 	}
 	set_pmd_at(vma->vm_mm, addr, pmdp, entry);
diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c
index d3aec7a9926a..220844a81e38 100644
--- a/mm/pgtable-generic.c
+++ b/mm/pgtable-generic.c
@@ -164,15 +164,15 @@ pud_t pudp_huge_clear_flush(struct vm_area_struct *vma, unsigned long address,
 
 #ifndef __HAVE_ARCH_PGTABLE_DEPOSIT
 void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp,
-				pgtable_t pgtable)
+				struct ptdesc *pgtable)
 {
 	assert_spin_locked(pmd_lockptr(mm, pmdp));
 
 	/* FIFO */
 	if (!pmd_huge_pte(mm, pmdp))
-		INIT_LIST_HEAD(&pgtable->lru);
+		INIT_LIST_HEAD(&pgtable->pt_list);
 	else
-		list_add(&pgtable->lru, &pmd_huge_pte(mm, pmdp)->lru);
+		list_add(&pgtable->pt_list, &pmd_huge_pte(mm, pmdp)->pt_list);
 	pmd_huge_pte(mm, pmdp) = pgtable;
 }
 #endif
@@ -181,17 +181,17 @@ void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp,
 /* no "address" argument so destroys page coloring of some arch */
 pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp)
 {
-	pgtable_t pgtable;
+	struct ptdesc *pgtable;
 
 	assert_spin_locked(pmd_lockptr(mm, pmdp));
 
 	/* FIFO */
 	pgtable = pmd_huge_pte(mm, pmdp);
-	pmd_huge_pte(mm, pmdp) = list_first_entry_or_null(&pgtable->lru,
-							  struct page, lru);
+	pmd_huge_pte(mm, pmdp) = list_first_entry_or_null(&pgtable->pt_list,
+							  struct ptdesc, pt_list);
 	if (pmd_huge_pte(mm, pmdp))
-		list_del(&pgtable->lru);
-	return pgtable;
+		list_del(&pgtable->pt_list);
+	return ptdesc_page(pgtable);
 }
 #endif
 
-- 
2.43.0



^ permalink raw reply	[flat|nested] 7+ messages in thread

* [RFC PATCH 2/2] mm/pgtable: convert pgtable_trans_huge_withdraw to ptdesc
  2025-12-14  6:55 [RFC PATCH 1/2] mm/pgtable: use ptdesc for pmd_huge_pte alexs
@ 2025-12-14  6:55 ` alexs
  2025-12-15  0:53 ` [RFC PATCH 1/2] mm/pgtable: use ptdesc for pmd_huge_pte Alex Shi
  2025-12-15  6:06 ` Christophe Leroy (CS GROUP)
  2 siblings, 0 replies; 7+ messages in thread
From: alexs @ 2025-12-14  6:55 UTC (permalink / raw)
  To: Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Alexander Gordeev, Gerald Schaefer,
	Heiko Carstens, Vasily Gorbik, Christian Borntraeger,
	Sven Schnelle, David S . Miller, Andreas Larsson, Andrew Morton,
	David Hildenbrand, Lorenzo Stoakes, Liam R . Howlett,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Zi Yan, Baolin Wang, Nico Pache, Ryan Roberts, Dev Jain,
	Barry Song, Lance Yang, Matthew Brost, Joshua Hahn, Rakie Kim,
	Byungchul Park, Gregory Price, Ying Huang, Alistair Popple,
	Thomas Huth, Will Deacon, Matthew Wilcox, Magnus Lindholm,
	linuxppc-dev, linux-s390, sparclinux, linux-mm
  Cc: Alex Shi, Juergen Gross, Jason Gunthorpe

From: Alex Shi <alexs@kernel.org>

Following the last function change in pgtable_trans_huge_deposit().
this time we convert the return value for pgtable_trans_huge_withdraw()

In future, we could do further to convert more pgtable_t to ptdesc
struct and then replace the pgtable_t to ptdesc* except s390/powerpc/sparc
archs.

Signed-off-by: Alex Shi <alexs@kernel.org>
Cc: linux-mm@kvack.org
Cc: sparclinux@vger.kernel.org
Cc: linux-s390@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Juergen Gross <jgross@suse.com>
Cc: Matthew Wilcox  <willy@infradead.org>
Cc: Magnus Lindholm <linmag7@gmail.com>
Cc: Thomas Huth <thuth@redhat.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Barry Song <baohua@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Christophe Leroy  <chleroy@kernel.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
---
 arch/powerpc/include/asm/book3s/64/pgtable.h |  8 ++++----
 arch/s390/include/asm/pgtable.h              |  2 +-
 arch/s390/mm/pgtable.c                       |  4 ++--
 arch/sparc/include/asm/pgtable_64.h          |  2 +-
 arch/sparc/mm/tlb.c                          |  4 ++--
 include/linux/pgtable.h                      |  3 ++-
 mm/huge_memory.c                             | 15 +++++++--------
 mm/pgtable-generic.c                         |  4 ++--
 8 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
index f10736af296d..3485de5178b5 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -1328,12 +1328,12 @@ static inline void pgtable_trans_huge_deposit(struct mm_struct *mm,
 }
 
 #define __HAVE_ARCH_PGTABLE_WITHDRAW
-static inline pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm,
-						    pmd_t *pmdp)
+static inline struct ptdesc *pgtable_trans_huge_withdraw(struct mm_struct *mm,
+							 pmd_t *pmdp)
 {
 	if (radix_enabled())
-		return radix__pgtable_trans_huge_withdraw(mm, pmdp);
-	return hash__pgtable_trans_huge_withdraw(mm, pmdp);
+		return page_ptdesc(radix__pgtable_trans_huge_withdraw(mm, pmdp));
+	return page_ptdesc(hash__pgtable_trans_huge_withdraw(mm, pmdp));
 }
 
 #define __HAVE_ARCH_PMDP_INVALIDATE
diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index e45cb52a923a..5f7fab7b121b 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -1754,7 +1754,7 @@ void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp,
 				struct ptdesc *pgtable);
 
 #define __HAVE_ARCH_PGTABLE_WITHDRAW
-pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp);
+struct ptdesc *pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp);
 
 #define  __HAVE_ARCH_PMDP_SET_ACCESS_FLAGS
 static inline int pmdp_set_access_flags(struct vm_area_struct *vma,
diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
index c301af71b3ec..6e53a30dd3ae 100644
--- a/arch/s390/mm/pgtable.c
+++ b/arch/s390/mm/pgtable.c
@@ -534,7 +534,7 @@ void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp,
 	pmd_huge_pte(mm, pmdp) = pgtable;
 }
 
-pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp)
+struct ptdesc *pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp)
 {
 	struct list_head *lh;
 	pgtable_t pgtable;
@@ -555,7 +555,7 @@ pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp)
 	set_pte(ptep, __pte(_PAGE_INVALID));
 	ptep++;
 	set_pte(ptep, __pte(_PAGE_INVALID));
-	return pgtable;
+	return page_ptdesc(pgtable);
 }
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 
diff --git a/arch/sparc/include/asm/pgtable_64.h b/arch/sparc/include/asm/pgtable_64.h
index 4b7f7113a1b3..29fc86175300 100644
--- a/arch/sparc/include/asm/pgtable_64.h
+++ b/arch/sparc/include/asm/pgtable_64.h
@@ -995,7 +995,7 @@ void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp,
 				struct ptdesc *pgtable);
 
 #define __HAVE_ARCH_PGTABLE_WITHDRAW
-pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp);
+struct ptdesc *pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp);
 #endif
 
 /*
diff --git a/arch/sparc/mm/tlb.c b/arch/sparc/mm/tlb.c
index 5dfee57d2440..8b00b62c06bd 100644
--- a/arch/sparc/mm/tlb.c
+++ b/arch/sparc/mm/tlb.c
@@ -284,7 +284,7 @@ void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp,
 	pmd_huge_pte(mm, pmdp) = pgtable;
 }
 
-pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp)
+struct ptdesc *pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp)
 {
 	struct list_head *lh;
 	pgtable_t pgtable;
@@ -303,6 +303,6 @@ pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp)
 	pte_val(pgtable[0]) = 0;
 	pte_val(pgtable[1]) = 0;
 
-	return pgtable;
+	return page_ptdesc(pgtable);
 }
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index a5b1e3f7452a..4b20b3d7aaec 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -1021,7 +1021,8 @@ extern void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp,
 #endif
 
 #ifndef __HAVE_ARCH_PGTABLE_WITHDRAW
-extern pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp);
+extern struct ptdesc *pgtable_trans_huge_withdraw(struct mm_struct *mm,
+						  pmd_t *pmdp);
 #endif
 
 #ifndef arch_needs_pgtable_deposit
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index ff74bd70690d..6f6cdb3ae888 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2334,7 +2334,7 @@ static inline void zap_deposited_table(struct mm_struct *mm, pmd_t *pmd)
 {
 	pgtable_t pgtable;
 
-	pgtable = pgtable_trans_huge_withdraw(mm, pmd);
+	pgtable = ptdesc_page(pgtable_trans_huge_withdraw(mm, pmd));
 	pte_free(mm, pgtable);
 	mm_dec_nr_ptes(mm);
 }
@@ -2492,10 +2492,9 @@ bool move_huge_pmd(struct vm_area_struct *vma, unsigned long old_addr,
 		VM_BUG_ON(!pmd_none(*new_pmd));
 
 		if (pmd_move_must_withdraw(new_ptl, old_ptl, vma)) {
-			pgtable_t pgtable;
+			struct ptdesc *pgtable;
 			pgtable = pgtable_trans_huge_withdraw(mm, old_pmd);
-			pgtable_trans_huge_deposit(mm, new_pmd,
-							page_ptdesc(pgtable));
+			pgtable_trans_huge_deposit(mm, new_pmd,	pgtable);
 		}
 		pmd = move_soft_dirty_pmd(pmd);
 		if (vma_has_uffd_without_event_remap(vma))
@@ -2710,7 +2709,7 @@ int move_pages_huge_pmd(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd, pm
 	struct page *src_page;
 	struct folio *src_folio;
 	spinlock_t *src_ptl, *dst_ptl;
-	pgtable_t src_pgtable;
+	struct ptdesc *src_pgtable;
 	struct mmu_notifier_range range;
 	int err = 0;
 
@@ -2801,7 +2800,7 @@ int move_pages_huge_pmd(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd, pm
 	set_pmd_at(mm, dst_addr, dst_pmd, _dst_pmd);
 
 	src_pgtable = pgtable_trans_huge_withdraw(mm, src_pmd);
-	pgtable_trans_huge_deposit(mm, dst_pmd, page_ptdesc(src_pgtable));
+	pgtable_trans_huge_deposit(mm, dst_pmd, src_pgtable);
 unlock_ptls:
 	double_pt_unlock(src_ptl, dst_ptl);
 	/* unblock rmap walks */
@@ -2962,7 +2961,7 @@ static void __split_huge_zero_page_pmd(struct vm_area_struct *vma,
 	 */
 	old_pmd = pmdp_huge_clear_flush(vma, haddr, pmd);
 
-	pgtable = pgtable_trans_huge_withdraw(mm, pmd);
+	pgtable = ptdesc_page(pgtable_trans_huge_withdraw(mm, pmd));
 	pmd_populate(mm, &_pmd, pgtable);
 
 	pte = pte_offset_map(&_pmd, haddr);
@@ -3169,7 +3168,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
 	 * Withdraw the table only after we mark the pmd entry invalid.
 	 * This's critical for some architectures (Power).
 	 */
-	pgtable = pgtable_trans_huge_withdraw(mm, pmd);
+	pgtable = ptdesc_page(pgtable_trans_huge_withdraw(mm, pmd));
 	pmd_populate(mm, &_pmd, pgtable);
 
 	pte = pte_offset_map(&_pmd, haddr);
diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c
index 220844a81e38..a95d9309215e 100644
--- a/mm/pgtable-generic.c
+++ b/mm/pgtable-generic.c
@@ -179,7 +179,7 @@ void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp,
 
 #ifndef __HAVE_ARCH_PGTABLE_WITHDRAW
 /* no "address" argument so destroys page coloring of some arch */
-pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp)
+struct ptdesc *pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp)
 {
 	struct ptdesc *pgtable;
 
@@ -191,7 +191,7 @@ pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp)
 							  struct ptdesc, pt_list);
 	if (pmd_huge_pte(mm, pmdp))
 		list_del(&pgtable->pt_list);
-	return ptdesc_page(pgtable);
+	return pgtable;
 }
 #endif
 
-- 
2.43.0



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH 1/2] mm/pgtable: use ptdesc for pmd_huge_pte
  2025-12-14  6:55 [RFC PATCH 1/2] mm/pgtable: use ptdesc for pmd_huge_pte alexs
  2025-12-14  6:55 ` [RFC PATCH 2/2] mm/pgtable: convert pgtable_trans_huge_withdraw to ptdesc alexs
@ 2025-12-15  0:53 ` Alex Shi
  2025-12-18 10:01   ` David Hildenbrand (Red Hat)
  2025-12-15  6:06 ` Christophe Leroy (CS GROUP)
  2 siblings, 1 reply; 7+ messages in thread
From: Alex Shi @ 2025-12-15  0:53 UTC (permalink / raw)
  To: alexs, Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Alexander Gordeev, Gerald Schaefer,
	Heiko Carstens, Vasily Gorbik, Christian Borntraeger,
	Sven Schnelle, David S . Miller, Andreas Larsson, Andrew Morton,
	David Hildenbrand, Lorenzo Stoakes, Liam R . Howlett,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Zi Yan, Baolin Wang, Nico Pache, Ryan Roberts, Dev Jain,
	Barry Song, Lance Yang, Matthew Brost, Joshua Hahn, Rakie Kim,
	Byungchul Park, Gregory Price, Ying Huang, Alistair Popple,
	Thomas Huth, Will Deacon, Matthew Wilcox, Magnus Lindholm,
	linuxppc-dev, linux-s390, sparclinux, linux-mm



On 2025/12/14 14:55, alexs@kernel.org wrote:
> From: Alex Shi<alexs@kernel.org>
> 
> 'pmd_huge_pte' are pgtable variables, but used 'pgtable->lru'
> instead of pgtable->pt_list in pgtable_trans_huge_deposit/withdraw
> functions, That's a bit weird.
> 
> So let's convert the pgtable_t to precise 'struct ptdesc *' for
> ptdesc->pmd_huge_pte, and mm->pmd_huge_pte, then convert function
> pgtable_trans_huge_deposit() to use correct ptdesc.
> 
> This convertion works for most of arch, but failed on s390/sparc/powerpc
> since they use 'pte_t *' as pgtable_t. Is there any suggestion for these
> archs? If we could have a solution, we may remove the pgtable_t for other
> archs.

If s390/sparc/powerpc can't align pgtable_t with others, we have to keep 
the pgtable_t to bridge different types. But we could take step to 
change pgtable_t as 'struct ptdesc *' in other archs. That could 
simplify and clarify related code too, isn't it?


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH 1/2] mm/pgtable: use ptdesc for pmd_huge_pte
  2025-12-14  6:55 [RFC PATCH 1/2] mm/pgtable: use ptdesc for pmd_huge_pte alexs
  2025-12-14  6:55 ` [RFC PATCH 2/2] mm/pgtable: convert pgtable_trans_huge_withdraw to ptdesc alexs
  2025-12-15  0:53 ` [RFC PATCH 1/2] mm/pgtable: use ptdesc for pmd_huge_pte Alex Shi
@ 2025-12-15  6:06 ` Christophe Leroy (CS GROUP)
  2025-12-15 14:26   ` Alex Shi
  2 siblings, 1 reply; 7+ messages in thread
From: Christophe Leroy (CS GROUP) @ 2025-12-15  6:06 UTC (permalink / raw)
  To: alexs, Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
	Alexander Gordeev, Gerald Schaefer, Heiko Carstens,
	Vasily Gorbik, Christian Borntraeger, Sven Schnelle,
	David S . Miller, Andreas Larsson, Andrew Morton,
	David Hildenbrand, Lorenzo Stoakes, Liam R . Howlett,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Zi Yan, Baolin Wang, Nico Pache, Ryan Roberts, Dev Jain,
	Barry Song, Lance Yang, Matthew Brost, Joshua Hahn, Rakie Kim,
	Byungchul Park, Gregory Price, Ying Huang, Alistair Popple,
	Thomas Huth, Will Deacon, Matthew Wilcox, Magnus Lindholm,
	linuxppc-dev, linux-s390, sparclinux, linux-mm



Le 14/12/2025 à 07:55, alexs@kernel.org a écrit :
> From: Alex Shi <alexs@kernel.org>
> 
> 'pmd_huge_pte' are pgtable variables, but used 'pgtable->lru'
> instead of pgtable->pt_list in pgtable_trans_huge_deposit/withdraw
> functions, That's a bit weird.
> 
> So let's convert the pgtable_t to precise 'struct ptdesc *' for
> ptdesc->pmd_huge_pte, and mm->pmd_huge_pte, then convert function
> pgtable_trans_huge_deposit() to use correct ptdesc.
> 
> This convertion works for most of arch, but failed on s390/sparc/powerpc
> since they use 'pte_t *' as pgtable_t. Is there any suggestion for these
> archs? If we could have a solution, we may remove the pgtable_t for other
> archs.

The use of struct ptdesc * assumes that a pagetable is contained in one 
(or several) page(s).

On powerpc, there can be several page tables in one page. For instance, 
on powerpc 8xx the hardware require page tables to be 4k at all time, 
allthough page sizes can be either 4k or 16k. So in the 16k case there 
are 4 pages tables in one page.

There is some logic in arch/powerpc/mm/pgtable-frag.c to handle that but 
this is only for last levels (PTs and PMDs). For other levels 
kmem_cache_alloc() is used to provide a PxD of the right size. Maybe the 
solution is to convert all levels to using pgtable-frag, but this 
doesn't look trivial. Probably it should be done at core level not at 
arch level.

Christophe

> 
> Signed-off-by: Alex Shi <alexs@kernel.org>
> ---
> 
> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
> index aac8ce30cd3b..f10736af296d 100644
> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
> @@ -1320,11 +1320,11 @@ pud_t pudp_huge_get_and_clear_full(struct vm_area_struct *vma,
>   
>   #define __HAVE_ARCH_PGTABLE_DEPOSIT
>   static inline void pgtable_trans_huge_deposit(struct mm_struct *mm,
> -					      pmd_t *pmdp, pgtable_t pgtable)
> +					      pmd_t *pmdp, struct ptdesc *pgtable)
>   {
>   	if (radix_enabled())
> -		return radix__pgtable_trans_huge_deposit(mm, pmdp, pgtable);
> -	return hash__pgtable_trans_huge_deposit(mm, pmdp, pgtable);
> +		return radix__pgtable_trans_huge_deposit(mm, pmdp, page_ptdesc(pgtable));
> +	return hash__pgtable_trans_huge_deposit(mm, pmdp, page_ptdesc(pgtable));
>   }
>   

I can't understand this change.

pgtable is a pointer to a page table, and you want to replace it to 
something that returns a pointer to a struct page, how can it work ?

Christophe


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH 1/2] mm/pgtable: use ptdesc for pmd_huge_pte
  2025-12-15  6:06 ` Christophe Leroy (CS GROUP)
@ 2025-12-15 14:26   ` Alex Shi
  0 siblings, 0 replies; 7+ messages in thread
From: Alex Shi @ 2025-12-15 14:26 UTC (permalink / raw)
  To: Christophe Leroy (CS GROUP),
	alexs, Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
	Alexander Gordeev, Gerald Schaefer, Heiko Carstens,
	Vasily Gorbik, Christian Borntraeger, Sven Schnelle,
	David S . Miller, Andreas Larsson, Andrew Morton,
	David Hildenbrand, Lorenzo Stoakes, Liam R . Howlett,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Zi Yan, Baolin Wang, Nico Pache, Ryan Roberts, Dev Jain,
	Barry Song, Lance Yang, Matthew Brost, Joshua Hahn, Rakie Kim,
	Byungchul Park, Gregory Price, Ying Huang, Alistair Popple,
	Thomas Huth, Will Deacon, Matthew Wilcox, Magnus Lindholm,
	linuxppc-dev, linux-s390, sparclinux, linux-mm



On 2025/12/15 14:06, Christophe Leroy (CS GROUP) wrote:
> 
> Le 14/12/2025 à 07:55, alexs@kernel.org a écrit :
>> From: Alex Shi <alexs@kernel.org>
>>
>> 'pmd_huge_pte' are pgtable variables, but used 'pgtable->lru'
>> instead of pgtable->pt_list in pgtable_trans_huge_deposit/withdraw
>> functions, That's a bit weird.
>>
>> So let's convert the pgtable_t to precise 'struct ptdesc *' for
>> ptdesc->pmd_huge_pte, and mm->pmd_huge_pte, then convert function
>> pgtable_trans_huge_deposit() to use correct ptdesc.
>>
>> This convertion works for most of arch, but failed on s390/sparc/powerpc
>> since they use 'pte_t *' as pgtable_t. Is there any suggestion for these
>> archs? If we could have a solution, we may remove the pgtable_t for other
>> archs.
> 
> The use of struct ptdesc * assumes that a pagetable is contained in one 
> (or several) page(s).
> 
> On powerpc, there can be several page tables in one page. For instance, 
> on powerpc 8xx the hardware require page tables to be 4k at all time, 
> allthough page sizes can be either 4k or 16k. So in the 16k case there 
> are 4 pages tables in one page.

Hi Christophe,

Thanks a lot for the info.

> 
> There is some logic in arch/powerpc/mm/pgtable-frag.c to handle that but 
> this is only for last levels (PTs and PMDs). For other levels 
> kmem_cache_alloc() is used to provide a PxD of the right size. Maybe the 
> solution is to convert all levels to using pgtable-frag, but this 
> doesn't look trivial. Probably it should be done at core level not at 
> arch level.

Uh, glad to hear some idea for this, would you like to give more 
detailed explanation of your ideas?

Thanks a lot

> 
> Christophe
> 
>>
>> Signed-off-by: Alex Shi <alexs@kernel.org>
>> ---
>>
>> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/ 
>> powerpc/include/asm/book3s/64/pgtable.h
>> index aac8ce30cd3b..f10736af296d 100644
>> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
>> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
>> @@ -1320,11 +1320,11 @@ pud_t pudp_huge_get_and_clear_full(struct 
>> vm_area_struct *vma,
>>   #define __HAVE_ARCH_PGTABLE_DEPOSIT
>>   static inline void pgtable_trans_huge_deposit(struct mm_struct *mm,
>> -                          pmd_t *pmdp, pgtable_t pgtable)
>> +                          pmd_t *pmdp, struct ptdesc *pgtable)
>>   {
>>       if (radix_enabled())
>> -        return radix__pgtable_trans_huge_deposit(mm, pmdp, pgtable);
>> -    return hash__pgtable_trans_huge_deposit(mm, pmdp, pgtable);
>> +        return radix__pgtable_trans_huge_deposit(mm, pmdp, 
>> page_ptdesc(pgtable));
>> +    return hash__pgtable_trans_huge_deposit(mm, pmdp, 
>> page_ptdesc(pgtable));
>>   }
> 
> I can't understand this change.
> 
> pgtable is a pointer to a page table, and you want to replace it to 
> something that returns a pointer to a struct page, how can it work ?

Sorry for the bothering. Right, it can't work as I mentioned in commit 
log. I just want to bring up this issue, and hope you expert to give 
some ideas.

Thanks


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH 1/2] mm/pgtable: use ptdesc for pmd_huge_pte
  2025-12-15  0:53 ` [RFC PATCH 1/2] mm/pgtable: use ptdesc for pmd_huge_pte Alex Shi
@ 2025-12-18 10:01   ` David Hildenbrand (Red Hat)
  2025-12-18 14:16     ` Alex Shi
  0 siblings, 1 reply; 7+ messages in thread
From: David Hildenbrand (Red Hat) @ 2025-12-18 10:01 UTC (permalink / raw)
  To: Alex Shi, alexs, Madhavan Srinivasan, Michael Ellerman,
	Nicholas Piggin, Christophe Leroy, Alexander Gordeev,
	Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, Sven Schnelle, David S . Miller,
	Andreas Larsson, Andrew Morton, Lorenzo Stoakes,
	Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Zi Yan, Baolin Wang,
	Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang,
	Matthew Brost, Joshua Hahn, Rakie Kim, Byungchul Park,
	Gregory Price, Ying Huang, Alistair Popple, Thomas Huth,
	Will Deacon, Matthew Wilcox, Magnus Lindholm, linuxppc-dev,
	linux-s390, sparclinux, linux-mm

On 12/15/25 01:53, Alex Shi wrote:
> 
> 
> On 2025/12/14 14:55, alexs@kernel.org wrote:
>> From: Alex Shi<alexs@kernel.org>
>>
>> 'pmd_huge_pte' are pgtable variables, but used 'pgtable->lru'
>> instead of pgtable->pt_list in pgtable_trans_huge_deposit/withdraw
>> functions, That's a bit weird.
>>
>> So let's convert the pgtable_t to precise 'struct ptdesc *' for
>> ptdesc->pmd_huge_pte, and mm->pmd_huge_pte, then convert function
>> pgtable_trans_huge_deposit() to use correct ptdesc.
>>
>> This convertion works for most of arch, but failed on s390/sparc/powerpc
>> since they use 'pte_t *' as pgtable_t. Is there any suggestion for these
>> archs? If we could have a solution, we may remove the pgtable_t for other
>> archs.
> 
> If s390/sparc/powerpc can't align pgtable_t with others, we have to keep
> the pgtable_t to bridge different types. But we could take step to
> change pgtable_t as 'struct ptdesc *' in other archs. That could
> simplify and clarify related code too, isn't it?

Not sure. s390 and friends squeeze multiple actual page tables into a 
single page and that single page has a single ptdesc.

I was rather hoping that we can make the code more consistent by making 
everybody just point at the start of the page table? (that is, make it 
consistent for all, not use ptdesc for some and pte_t * for others)

-- 
Cheers

David


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH 1/2] mm/pgtable: use ptdesc for pmd_huge_pte
  2025-12-18 10:01   ` David Hildenbrand (Red Hat)
@ 2025-12-18 14:16     ` Alex Shi
  0 siblings, 0 replies; 7+ messages in thread
From: Alex Shi @ 2025-12-18 14:16 UTC (permalink / raw)
  To: David Hildenbrand (Red Hat),
	alexs, Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Alexander Gordeev, Gerald Schaefer,
	Heiko Carstens, Vasily Gorbik, Christian Borntraeger,
	Sven Schnelle, David S . Miller, Andreas Larsson, Andrew Morton,
	Lorenzo Stoakes, Liam R . Howlett, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Zi Yan,
	Baolin Wang, Nico Pache, Ryan Roberts, Dev Jain, Barry Song,
	Lance Yang, Matthew Brost, Joshua Hahn, Rakie Kim,
	Byungchul Park, Gregory Price, Ying Huang, Alistair Popple,
	Thomas Huth, Will Deacon, Matthew Wilcox, Magnus Lindholm,
	linuxppc-dev, linux-s390, sparclinux, linux-mm



On 2025/12/18 18:01, David Hildenbrand (Red Hat) wrote:
> On 12/15/25 01:53, Alex Shi wrote:
>>
>>
>> On 2025/12/14 14:55, alexs@kernel.org wrote:
>>> From: Alex Shi<alexs@kernel.org>
>>>
>>> 'pmd_huge_pte' are pgtable variables, but used 'pgtable->lru'
>>> instead of pgtable->pt_list in pgtable_trans_huge_deposit/withdraw
>>> functions, That's a bit weird.
>>>
>>> So let's convert the pgtable_t to precise 'struct ptdesc *' for
>>> ptdesc->pmd_huge_pte, and mm->pmd_huge_pte, then convert function
>>> pgtable_trans_huge_deposit() to use correct ptdesc.
>>>
>>> This convertion works for most of arch, but failed on s390/sparc/powerpc
>>> since they use 'pte_t *' as pgtable_t. Is there any suggestion for these
>>> archs? If we could have a solution, we may remove the pgtable_t for 
>>> other
>>> archs.
>>
>> If s390/sparc/powerpc can't align pgtable_t with others, we have to keep
>> the pgtable_t to bridge different types. But we could take step to
>> change pgtable_t as 'struct ptdesc *' in other archs. That could
>> simplify and clarify related code too, isn't it?
> 
> Not sure. s390 and friends squeeze multiple actual page tables into a 
> single page and that single page has a single ptdesc.
> 
> I was rather hoping that we can make the code more consistent by making 
> everybody just point at the start of the page table? (that is, make it 
> consistent for all, not use ptdesc for some and pte_t * for others)
> 

Got it. That would be great if owners of these archs like to work on this.

Thanks
Alex


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-12-18 14:16 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-12-14  6:55 [RFC PATCH 1/2] mm/pgtable: use ptdesc for pmd_huge_pte alexs
2025-12-14  6:55 ` [RFC PATCH 2/2] mm/pgtable: convert pgtable_trans_huge_withdraw to ptdesc alexs
2025-12-15  0:53 ` [RFC PATCH 1/2] mm/pgtable: use ptdesc for pmd_huge_pte Alex Shi
2025-12-18 10:01   ` David Hildenbrand (Red Hat)
2025-12-18 14:16     ` Alex Shi
2025-12-15  6:06 ` Christophe Leroy (CS GROUP)
2025-12-15 14:26   ` Alex Shi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox