linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] Convert pgtable to use frozen pages
@ 2025-11-13 14:04 Matthew Wilcox (Oracle)
  2025-11-13 14:04 ` [PATCH 1/4] mm: Use frozen pages for page tables Matthew Wilcox (Oracle)
                   ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: Matthew Wilcox (Oracle) @ 2025-11-13 14:04 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Matthew Wilcox (Oracle), David Hildenbrand, Vishal Moola, linux-mm

Page tables do not use the refcount in struct page and have a
straightforward alloc/free model rather than get/put (even on
architectures which have multiple page tables in a single page).
That means we can use frozen pages which are slightly more efficient and
confirms that we won't need to include a refcount as part of the ptdesc
when it's separately allocated.

While I'm looking at this, simplify the constructor/destructor machinery
and remove an aliasing use of page->lru instead of ptdesc->pt_list.

Based on next-20251110.

Matthew Wilcox (Oracle) (4):
  mm: Use frozen pages for page tables
  mm: Account pagetable memory when allocated
  mm: Mark pagetable memory when allocated
  pgtable: Remove uses of page->lru

 include/linux/mm.h       | 75 ++++------------------------------------
 include/linux/mm_types.h |  1 +
 mm/memory.c              | 46 ++++++++++++++++++++++++
 mm/pgtable-generic.c     | 27 +++++++++------
 4 files changed, 70 insertions(+), 79 deletions(-)

-- 
2.47.2



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/4] mm: Use frozen pages for page tables
  2025-11-13 14:04 [PATCH 0/4] Convert pgtable to use frozen pages Matthew Wilcox (Oracle)
@ 2025-11-13 14:04 ` Matthew Wilcox (Oracle)
  2025-11-13 18:24   ` Vishal Moola (Oracle)
                     ` (2 more replies)
  2025-11-13 14:04 ` [PATCH 2/4] mm: Account pagetable memory when allocated Matthew Wilcox (Oracle)
                   ` (2 subsequent siblings)
  3 siblings, 3 replies; 16+ messages in thread
From: Matthew Wilcox (Oracle) @ 2025-11-13 14:04 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Matthew Wilcox (Oracle), David Hildenbrand, Vishal Moola, linux-mm

Page tables do not use the reference count.  That means we can avoid
two atomic operations (one on alloc, one on free) by allocating frozen
pages here.  This does not interfere with compaction as page tables are
non-movable allocations.

pagetable_alloc() and pagetable_free() need to move out of line to make
this work as alloc_frozen_page() and free_frozen_page() are not exported
outside the mm for now.  We'll want them out of line anyway soon.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/mm.h   | 53 +++++---------------------------------------
 mm/memory.c          | 34 ++++++++++++++++++++++++++++
 mm/pgtable-generic.c |  3 ++-
 3 files changed, 42 insertions(+), 48 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 5087deecdd9c..e168ee23091e 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2995,58 +2995,17 @@ static inline void ptdesc_clear_kernel(struct ptdesc *ptdesc)
  */
 static inline bool ptdesc_test_kernel(const struct ptdesc *ptdesc)
 {
+#ifdef CONFIG_ASYNC_KERNEL_PGTABLE_FREE
 	return test_bit(PT_kernel, &ptdesc->pt_flags.f);
+#else
+	return false;
+#endif
 }
 
-/**
- * pagetable_alloc - Allocate pagetables
- * @gfp:    GFP flags
- * @order:  desired pagetable order
- *
- * pagetable_alloc allocates memory for page tables as well as a page table
- * descriptor to describe that memory.
- *
- * Return: The ptdesc describing the allocated page tables.
- */
-static inline struct ptdesc *pagetable_alloc_noprof(gfp_t gfp, unsigned int order)
-{
-	struct page *page = alloc_pages_noprof(gfp | __GFP_COMP, order);
-
-	return page_ptdesc(page);
-}
+struct ptdesc *pagetable_alloc_noprof(gfp_t gfp, unsigned int order);
 #define pagetable_alloc(...)	alloc_hooks(pagetable_alloc_noprof(__VA_ARGS__))
-
-static inline void __pagetable_free(struct ptdesc *pt)
-{
-	struct page *page = ptdesc_page(pt);
-
-	__free_pages(page, compound_order(page));
-}
-
-#ifdef CONFIG_ASYNC_KERNEL_PGTABLE_FREE
+void pagetable_free(struct ptdesc *pt);
 void pagetable_free_kernel(struct ptdesc *pt);
-#else
-static inline void pagetable_free_kernel(struct ptdesc *pt)
-{
-	__pagetable_free(pt);
-}
-#endif
-/**
- * pagetable_free - Free pagetables
- * @pt:	The page table descriptor
- *
- * pagetable_free frees the memory of all page tables described by a page
- * table descriptor and the memory for the descriptor itself.
- */
-static inline void pagetable_free(struct ptdesc *pt)
-{
-	if (ptdesc_test_kernel(pt)) {
-		ptdesc_clear_kernel(pt);
-		pagetable_free_kernel(pt);
-	} else {
-		__pagetable_free(pt);
-	}
-}
 
 #if defined(CONFIG_SPLIT_PTE_PTLOCKS)
 #if ALLOC_SPLIT_PTLOCKS
diff --git a/mm/memory.c b/mm/memory.c
index 1c66ee83a7ab..781cd7f607f7 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -7338,6 +7338,40 @@ long copy_folio_from_user(struct folio *dst_folio,
 }
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE || CONFIG_HUGETLBFS */
 
+/**
+ * pagetable_alloc - Allocate pagetables
+ * @gfp:    GFP flags
+ * @order:  desired pagetable order
+ *
+ * pagetable_alloc allocates memory for page tables as well as a page table
+ * descriptor to describe that memory.
+ *
+ * Return: The ptdesc describing the allocated page tables.
+ */
+struct ptdesc *pagetable_alloc_noprof(gfp_t gfp, unsigned int order)
+{
+	struct page *page = alloc_frozen_pages_noprof(gfp | __GFP_COMP, order);
+
+	return page_ptdesc(page);
+}
+
+/**
+ * pagetable_free - Free pagetables
+ * @pt:	The page table descriptor
+ *
+ * pagetable_free frees the memory of all page tables described by a page
+ * table descriptor and the memory for the descriptor itself.
+ */
+void pagetable_free(struct ptdesc *pt)
+{
+	struct page *page = ptdesc_page(pt);
+
+	if (ptdesc_test_kernel(pt))
+		pagetable_free_kernel(pt);
+	else
+		free_frozen_pages(page, compound_order(page));
+}
+
 #if defined(CONFIG_SPLIT_PTE_PTLOCKS) && ALLOC_SPLIT_PTLOCKS
 
 static struct kmem_cache *page_ptl_cachep;
diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c
index d3aec7a9926a..597049e21ac1 100644
--- a/mm/pgtable-generic.c
+++ b/mm/pgtable-generic.c
@@ -434,11 +434,12 @@ static void kernel_pgtable_work_func(struct work_struct *work)
 
 	iommu_sva_invalidate_kva_range(PAGE_OFFSET, TLB_FLUSH_ALL);
 	list_for_each_entry_safe(pt, next, &page_list, pt_list)
-		__pagetable_free(pt);
+		pagetable_free(pt);
 }
 
 void pagetable_free_kernel(struct ptdesc *pt)
 {
+	ptdesc_clear_kernel(pt);
 	spin_lock(&kernel_pgtable_work.lock);
 	list_add(&pt->pt_list, &kernel_pgtable_work.list);
 	spin_unlock(&kernel_pgtable_work.lock);
-- 
2.47.2



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 2/4] mm: Account pagetable memory when allocated
  2025-11-13 14:04 [PATCH 0/4] Convert pgtable to use frozen pages Matthew Wilcox (Oracle)
  2025-11-13 14:04 ` [PATCH 1/4] mm: Use frozen pages for page tables Matthew Wilcox (Oracle)
@ 2025-11-13 14:04 ` Matthew Wilcox (Oracle)
  2025-11-13 19:39   ` Vishal Moola (Oracle)
  2025-11-13 14:04 ` [PATCH 3/4] mm: Mark " Matthew Wilcox (Oracle)
  2025-11-13 14:04 ` [PATCH 4/4] pgtable: Remove uses of page->lru Matthew Wilcox (Oracle)
  3 siblings, 1 reply; 16+ messages in thread
From: Matthew Wilcox (Oracle) @ 2025-11-13 14:04 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Matthew Wilcox (Oracle), David Hildenbrand, Vishal Moola, linux-mm

Move the accounting from the constructor to the allocation site.
Some of the architecture code is a little complex to reason about,
but I think this is all correct (and slightly more efficient due
to having 'order' as an argument instead of having to retrieve it
from struct page again).

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/mm.h | 11 -----------
 mm/memory.c        | 16 +++++++++++++---
 2 files changed, 13 insertions(+), 14 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index e168ee23091e..17f783c04c87 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3082,26 +3082,15 @@ static inline bool ptlock_init(struct ptdesc *ptdesc) { return true; }
 static inline void ptlock_free(struct ptdesc *ptdesc) {}
 #endif /* defined(CONFIG_SPLIT_PTE_PTLOCKS) */
 
-static inline unsigned long ptdesc_nr_pages(const struct ptdesc *ptdesc)
-{
-	return compound_nr(ptdesc_page(ptdesc));
-}
-
 static inline void __pagetable_ctor(struct ptdesc *ptdesc)
 {
-	pg_data_t *pgdat = NODE_DATA(memdesc_nid(ptdesc->pt_flags));
-
 	__SetPageTable(ptdesc_page(ptdesc));
-	mod_node_page_state(pgdat, NR_PAGETABLE, ptdesc_nr_pages(ptdesc));
 }
 
 static inline void pagetable_dtor(struct ptdesc *ptdesc)
 {
-	pg_data_t *pgdat = NODE_DATA(memdesc_nid(ptdesc->pt_flags));
-
 	ptlock_free(ptdesc);
 	__ClearPageTable(ptdesc_page(ptdesc));
-	mod_node_page_state(pgdat, NR_PAGETABLE, -ptdesc_nr_pages(ptdesc));
 }
 
 static inline void pagetable_dtor_free(struct ptdesc *ptdesc)
diff --git a/mm/memory.c b/mm/memory.c
index 781cd7f607f7..35886fde189c 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -7351,7 +7351,13 @@ long copy_folio_from_user(struct folio *dst_folio,
 struct ptdesc *pagetable_alloc_noprof(gfp_t gfp, unsigned int order)
 {
 	struct page *page = alloc_frozen_pages_noprof(gfp | __GFP_COMP, order);
+	pg_data_t *pgdat;
 
+	if (!page)
+		return NULL;
+
+	pgdat = NODE_DATA(page_to_nid(page));
+	mod_node_page_state(pgdat, NR_PAGETABLE, 1 << order);
 	return page_ptdesc(page);
 }
 
@@ -7364,12 +7370,16 @@ struct ptdesc *pagetable_alloc_noprof(gfp_t gfp, unsigned int order)
  */
 void pagetable_free(struct ptdesc *pt)
 {
+	pg_data_t *pgdat = NODE_DATA(memdesc_nid(pt->pt_flags));
 	struct page *page = ptdesc_page(pt);
+	unsigned int order = compound_order(page);
 
-	if (ptdesc_test_kernel(pt))
+	if (ptdesc_test_kernel(pt)) {
 		pagetable_free_kernel(pt);
-	else
-		free_frozen_pages(page, compound_order(page));
+		return;
+	}
+	mod_node_page_state(pgdat, NR_PAGETABLE, -(1L << order));
+	free_frozen_pages(page, order);
 }
 
 #if defined(CONFIG_SPLIT_PTE_PTLOCKS) && ALLOC_SPLIT_PTLOCKS
-- 
2.47.2



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 3/4] mm: Mark pagetable memory when allocated
  2025-11-13 14:04 [PATCH 0/4] Convert pgtable to use frozen pages Matthew Wilcox (Oracle)
  2025-11-13 14:04 ` [PATCH 1/4] mm: Use frozen pages for page tables Matthew Wilcox (Oracle)
  2025-11-13 14:04 ` [PATCH 2/4] mm: Account pagetable memory when allocated Matthew Wilcox (Oracle)
@ 2025-11-13 14:04 ` Matthew Wilcox (Oracle)
  2025-11-18 17:00   ` David Hildenbrand (Red Hat)
  2025-11-13 14:04 ` [PATCH 4/4] pgtable: Remove uses of page->lru Matthew Wilcox (Oracle)
  3 siblings, 1 reply; 16+ messages in thread
From: Matthew Wilcox (Oracle) @ 2025-11-13 14:04 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Matthew Wilcox (Oracle), David Hildenbrand, Vishal Moola, linux-mm

Move the page type setting from the constructor to the allocation site.
Some of the architecture code is a little complex to reason about, but
I think this is all correct.  This makes __pagetable_ctor() empty, so
remove it.  While pagetable_pud_ctor() and higher levels are now empty,
leave them alone as there may be call to have them do something in future.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/mm.h | 11 -----------
 mm/memory.c        |  2 ++
 2 files changed, 2 insertions(+), 11 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 17f783c04c87..3111344b8d05 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3082,15 +3082,9 @@ static inline bool ptlock_init(struct ptdesc *ptdesc) { return true; }
 static inline void ptlock_free(struct ptdesc *ptdesc) {}
 #endif /* defined(CONFIG_SPLIT_PTE_PTLOCKS) */
 
-static inline void __pagetable_ctor(struct ptdesc *ptdesc)
-{
-	__SetPageTable(ptdesc_page(ptdesc));
-}
-
 static inline void pagetable_dtor(struct ptdesc *ptdesc)
 {
 	ptlock_free(ptdesc);
-	__ClearPageTable(ptdesc_page(ptdesc));
 }
 
 static inline void pagetable_dtor_free(struct ptdesc *ptdesc)
@@ -3104,7 +3098,6 @@ static inline bool pagetable_pte_ctor(struct mm_struct *mm,
 {
 	if (mm != &init_mm && !ptlock_init(ptdesc))
 		return false;
-	__pagetable_ctor(ptdesc);
 	return true;
 }
 
@@ -3212,7 +3205,6 @@ static inline bool pagetable_pmd_ctor(struct mm_struct *mm,
 	if (mm != &init_mm && !pmd_ptlock_init(ptdesc))
 		return false;
 	ptdesc_pmd_pts_init(ptdesc);
-	__pagetable_ctor(ptdesc);
 	return true;
 }
 
@@ -3237,17 +3229,14 @@ static inline spinlock_t *pud_lock(struct mm_struct *mm, pud_t *pud)
 
 static inline void pagetable_pud_ctor(struct ptdesc *ptdesc)
 {
-	__pagetable_ctor(ptdesc);
 }
 
 static inline void pagetable_p4d_ctor(struct ptdesc *ptdesc)
 {
-	__pagetable_ctor(ptdesc);
 }
 
 static inline void pagetable_pgd_ctor(struct ptdesc *ptdesc)
 {
-	__pagetable_ctor(ptdesc);
 }
 
 extern void __init pagecache_init(void);
diff --git a/mm/memory.c b/mm/memory.c
index 35886fde189c..54480b12eb8c 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -7358,6 +7358,7 @@ struct ptdesc *pagetable_alloc_noprof(gfp_t gfp, unsigned int order)
 
 	pgdat = NODE_DATA(page_to_nid(page));
 	mod_node_page_state(pgdat, NR_PAGETABLE, 1 << order);
+	__SetPageTable(page);
 	return page_ptdesc(page);
 }
 
@@ -7379,6 +7380,7 @@ void pagetable_free(struct ptdesc *pt)
 		return;
 	}
 	mod_node_page_state(pgdat, NR_PAGETABLE, -(1L << order));
+	__ClearPageTable(page);
 	free_frozen_pages(page, order);
 }
 
-- 
2.47.2



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 4/4] pgtable: Remove uses of page->lru
  2025-11-13 14:04 [PATCH 0/4] Convert pgtable to use frozen pages Matthew Wilcox (Oracle)
                   ` (2 preceding siblings ...)
  2025-11-13 14:04 ` [PATCH 3/4] mm: Mark " Matthew Wilcox (Oracle)
@ 2025-11-13 14:04 ` Matthew Wilcox (Oracle)
  2025-11-20 13:56   ` David Hildenbrand (Red Hat)
  3 siblings, 1 reply; 16+ messages in thread
From: Matthew Wilcox (Oracle) @ 2025-11-13 14:04 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Matthew Wilcox (Oracle), David Hildenbrand, Vishal Moola, linux-mm

Use ptdesc->pt_list instead of page->lru.  These are the same bits for
now, but will be different when ptdesc is allocated separately.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/mm_types.h |  1 +
 mm/pgtable-generic.c     | 24 +++++++++++++++---------
 2 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index a0f4bd6099cc..5e08c4a41777 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -602,6 +602,7 @@ struct ptdesc {
 TABLE_MATCH(flags, pt_flags);
 TABLE_MATCH(compound_head, pt_list);
 TABLE_MATCH(compound_head, _pt_pad_1);
+TABLE_MATCH(lru, pt_list);
 TABLE_MATCH(mapping, __page_mapping);
 TABLE_MATCH(__folio_index, pt_index);
 TABLE_MATCH(rcu_head, pt_rcu_head);
diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c
index 597049e21ac1..a3990c04b31e 100644
--- a/mm/pgtable-generic.c
+++ b/mm/pgtable-generic.c
@@ -166,13 +166,14 @@ pud_t pudp_huge_clear_flush(struct vm_area_struct *vma, unsigned long address,
 void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp,
 				pgtable_t pgtable)
 {
+	struct ptdesc *ptdesc = page_ptdesc(pgtable);
 	assert_spin_locked(pmd_lockptr(mm, pmdp));
 
 	/* FIFO */
 	if (!pmd_huge_pte(mm, pmdp))
-		INIT_LIST_HEAD(&pgtable->lru);
+		INIT_LIST_HEAD(&ptdesc->pt_list);
 	else
-		list_add(&pgtable->lru, &pmd_huge_pte(mm, pmdp)->lru);
+		list_add(&ptdesc->pt_list, &page_ptdesc(pmd_huge_pte(mm, pmdp))->pt_list);
 	pmd_huge_pte(mm, pmdp) = pgtable;
 }
 #endif
@@ -181,17 +182,22 @@ void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp,
 /* no "address" argument so destroys page coloring of some arch */
 pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp)
 {
-	pgtable_t pgtable;
+	struct ptdesc *ptdesc, *next;
+	struct page *page;
 
 	assert_spin_locked(pmd_lockptr(mm, pmdp));
 
 	/* FIFO */
-	pgtable = pmd_huge_pte(mm, pmdp);
-	pmd_huge_pte(mm, pmdp) = list_first_entry_or_null(&pgtable->lru,
-							  struct page, lru);
-	if (pmd_huge_pte(mm, pmdp))
-		list_del(&pgtable->lru);
-	return pgtable;
+	page = pmd_huge_pte(mm, pmdp);
+	ptdesc = page_ptdesc(page);
+	next = list_first_entry_or_null(&ptdesc->pt_list, struct ptdesc, pt_list);
+	if (next) {
+		pmd_huge_pte(mm, pmdp) = ptdesc_page(next);
+		list_del(&ptdesc->pt_list);
+	} else {
+		pmd_huge_pte(mm, pmdp) = NULL;
+	}
+	return page;
 }
 #endif
 
-- 
2.47.2



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/4] mm: Use frozen pages for page tables
  2025-11-13 14:04 ` [PATCH 1/4] mm: Use frozen pages for page tables Matthew Wilcox (Oracle)
@ 2025-11-13 18:24   ` Vishal Moola (Oracle)
  2025-11-13 19:14     ` Vishal Moola (Oracle)
  2025-11-17 14:38   ` kernel test robot
  2025-11-19 15:46   ` Chih-En Lin
  2 siblings, 1 reply; 16+ messages in thread
From: Vishal Moola (Oracle) @ 2025-11-13 18:24 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: Andrew Morton, David Hildenbrand, linux-mm

On Thu, Nov 13, 2025 at 02:04:43PM +0000, Matthew Wilcox (Oracle) wrote:
> Page tables do not use the reference count.  That means we can avoid
> two atomic operations (one on alloc, one on free) by allocating frozen
> pages here.  This does not interfere with compaction as page tables are
> non-movable allocations.
> 
> pagetable_alloc() and pagetable_free() need to move out of line to make
> this work as alloc_frozen_page() and free_frozen_page() are not exported

Nit: s/page/pages/g

> outside the mm for now.  We'll want them out of line anyway soon.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

LGTM.
Reviewed-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/4] mm: Use frozen pages for page tables
  2025-11-13 18:24   ` Vishal Moola (Oracle)
@ 2025-11-13 19:14     ` Vishal Moola (Oracle)
  2025-11-14 13:45       ` Matthew Wilcox
  2025-11-14 14:31       ` Will Deacon
  0 siblings, 2 replies; 16+ messages in thread
From: Vishal Moola (Oracle) @ 2025-11-13 19:14 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: Andrew Morton, David Hildenbrand, linux-mm, Will Deacon

On Thu, Nov 13, 2025 at 10:24:27AM -0800, Vishal Moola (Oracle) wrote:
> On Thu, Nov 13, 2025 at 02:04:43PM +0000, Matthew Wilcox (Oracle) wrote:
> > Page tables do not use the reference count.  That means we can avoid

While looking into the second patch, I came across this one instance in
sparc32. Commit 1996d47a0db ("sparc32: mm: Only call ctor()/dtor()
functions for first and last user") decided to start using the refcount
for something.

I'm not certain this code should even be using ptdescs, but theres a
page_ptdesc() call in there for now :(.

As far as I can tell, this is only a thing for sparc32. Cc-ing Will as
the author of those commits.

> > two atomic operations (one on alloc, one on free) by allocating frozen
> > pages here.  This does not interfere with compaction as page tables are
> > non-movable allocations.
> > 
> > pagetable_alloc() and pagetable_free() need to move out of line to make
> > this work as alloc_frozen_page() and free_frozen_page() are not exported
> 
> Nit: s/page/pages/g
> 
> > outside the mm for now.  We'll want them out of line anyway soon.
> > 
> > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> 
> LGTM.
> Reviewed-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>

So I guess I retract this for now...


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/4] mm: Account pagetable memory when allocated
  2025-11-13 14:04 ` [PATCH 2/4] mm: Account pagetable memory when allocated Matthew Wilcox (Oracle)
@ 2025-11-13 19:39   ` Vishal Moola (Oracle)
  0 siblings, 0 replies; 16+ messages in thread
From: Vishal Moola (Oracle) @ 2025-11-13 19:39 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: Andrew Morton, David Hildenbrand, linux-mm, Will Deacon, Qi Zheng

On Thu, Nov 13, 2025 at 02:04:44PM +0000, Matthew Wilcox (Oracle) wrote:
> Move the accounting from the constructor to the allocation site.
> Some of the architecture code is a little complex to reason about,

I think the patch is correct. After taking a look at all the paths, I
noticed 2 cases that I'm uncertain about:

1) Commit 454b0289c6b5 ("sparc32: mm: Don't try to free page-table pages
if ctor() fails") implies that sparc32 uses memblock pages, so we never
actually free them with pagetable_free(). I'm not sure how the
allocation path looks. Would this break the accounting? +Cc-ing Will.

2) In !CONFIG_MMU_GATHER_TABLE_FREE, tlb_remove_table() calls
pagetable_dtor() -> tlb_remove_page(). I'm wondering if this may run into
accounting issues?
Commit f21bb37afbba0 ("mm: pgtable: make generic tlb_remove_table() use
struct ptdesc") says it's only an arm thing that needs to be fixed
anyway. +Cc-ing Qi.

> but I think this is all correct (and slightly more efficient due
> to having 'order' as an argument instead of having to retrieve it
> from struct page again).
>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  include/linux/mm.h | 11 -----------
>  mm/memory.c        | 16 +++++++++++++---
>  2 files changed, 13 insertions(+), 14 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index e168ee23091e..17f783c04c87 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -3082,26 +3082,15 @@ static inline bool ptlock_init(struct ptdesc *ptdesc) { return true; }
>  static inline void ptlock_free(struct ptdesc *ptdesc) {}
>  #endif /* defined(CONFIG_SPLIT_PTE_PTLOCKS) */
>  
> -static inline unsigned long ptdesc_nr_pages(const struct ptdesc *ptdesc)
> -{
> -	return compound_nr(ptdesc_page(ptdesc));
> -}
> -
>  static inline void __pagetable_ctor(struct ptdesc *ptdesc)
>  {
> -	pg_data_t *pgdat = NODE_DATA(memdesc_nid(ptdesc->pt_flags));
> -
>  	__SetPageTable(ptdesc_page(ptdesc));
> -	mod_node_page_state(pgdat, NR_PAGETABLE, ptdesc_nr_pages(ptdesc));
>  }
>  
>  static inline void pagetable_dtor(struct ptdesc *ptdesc)
>  {
> -	pg_data_t *pgdat = NODE_DATA(memdesc_nid(ptdesc->pt_flags));
> -
>  	ptlock_free(ptdesc);
>  	__ClearPageTable(ptdesc_page(ptdesc));
> -	mod_node_page_state(pgdat, NR_PAGETABLE, -ptdesc_nr_pages(ptdesc));
>  }
>  
>  static inline void pagetable_dtor_free(struct ptdesc *ptdesc)
> diff --git a/mm/memory.c b/mm/memory.c
> index 781cd7f607f7..35886fde189c 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -7351,7 +7351,13 @@ long copy_folio_from_user(struct folio *dst_folio,
>  struct ptdesc *pagetable_alloc_noprof(gfp_t gfp, unsigned int order)
>  {
>  	struct page *page = alloc_frozen_pages_noprof(gfp | __GFP_COMP, order);
> +	pg_data_t *pgdat;
>  
> +	if (!page)
> +		return NULL;
> +
> +	pgdat = NODE_DATA(page_to_nid(page));
> +	mod_node_page_state(pgdat, NR_PAGETABLE, 1 << order);
>  	return page_ptdesc(page);
>  }
>  
> @@ -7364,12 +7370,16 @@ struct ptdesc *pagetable_alloc_noprof(gfp_t gfp, unsigned int order)
>   */
>  void pagetable_free(struct ptdesc *pt)
>  {
> +	pg_data_t *pgdat = NODE_DATA(memdesc_nid(pt->pt_flags));
>  	struct page *page = ptdesc_page(pt);
> +	unsigned int order = compound_order(page);
>  
> -	if (ptdesc_test_kernel(pt))
> +	if (ptdesc_test_kernel(pt)) {
>  		pagetable_free_kernel(pt);
> -	else
> -		free_frozen_pages(page, compound_order(page));
> +		return;
> +	}
> +	mod_node_page_state(pgdat, NR_PAGETABLE, -(1L << order));
> +	free_frozen_pages(page, order);
>  }
>  
>  #if defined(CONFIG_SPLIT_PTE_PTLOCKS) && ALLOC_SPLIT_PTLOCKS
> -- 
> 2.47.2
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/4] mm: Use frozen pages for page tables
  2025-11-13 19:14     ` Vishal Moola (Oracle)
@ 2025-11-14 13:45       ` Matthew Wilcox
  2025-11-14 14:31       ` Will Deacon
  1 sibling, 0 replies; 16+ messages in thread
From: Matthew Wilcox @ 2025-11-14 13:45 UTC (permalink / raw)
  To: Vishal Moola (Oracle)
  Cc: Andrew Morton, David Hildenbrand, linux-mm, Will Deacon

On Thu, Nov 13, 2025 at 11:14:38AM -0800, Vishal Moola (Oracle) wrote:
> On Thu, Nov 13, 2025 at 10:24:27AM -0800, Vishal Moola (Oracle) wrote:
> > On Thu, Nov 13, 2025 at 02:04:43PM +0000, Matthew Wilcox (Oracle) wrote:
> > > Page tables do not use the reference count.  That means we can avoid
> 
> While looking into the second patch, I came across this one instance in
> sparc32. Commit 1996d47a0db ("sparc32: mm: Only call ctor()/dtor()
> functions for first and last user") decided to start using the refcount
> for something.
> 
> I'm not certain this code should even be using ptdescs, but theres a
> page_ptdesc() call in there for now :(.
> 
> As far as I can tell, this is only a thing for sparc32. Cc-ing Will as
> the author of those commits.

I think you're right that this is a problem for the ptdesc project,
but not (I think) a problem for this patch.  Maybe for the wording.

This patch changes pagetable_alloc() / pagetable_free() to use frozen
pages.  But sparc32 doesn't use pagetable_alloc() (sparc64 does).
So this patch should not break sparc32, and it can continue to use the
page refcount like this.

Where we do run into problems with this series is in patch 2
where I move the mod_node_page_state() from pagetable_pte_ctor() to
pagetable_alloc_noprof().  Since sparc32 doesn't call pagetable_alloc(),
its page tables will not be accounted.

I think I have a solution, but it'll take at least some build testing.
Looks like sparc64-linux-gnu-gcc at least accepts a -m32 flag, so I'll
give it a go.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/4] mm: Use frozen pages for page tables
  2025-11-13 19:14     ` Vishal Moola (Oracle)
  2025-11-14 13:45       ` Matthew Wilcox
@ 2025-11-14 14:31       ` Will Deacon
  1 sibling, 0 replies; 16+ messages in thread
From: Will Deacon @ 2025-11-14 14:31 UTC (permalink / raw)
  To: Vishal Moola (Oracle)
  Cc: Matthew Wilcox (Oracle), Andrew Morton, David Hildenbrand, linux-mm

On Thu, Nov 13, 2025 at 11:14:38AM -0800, Vishal Moola (Oracle) wrote:
> On Thu, Nov 13, 2025 at 10:24:27AM -0800, Vishal Moola (Oracle) wrote:
> > On Thu, Nov 13, 2025 at 02:04:43PM +0000, Matthew Wilcox (Oracle) wrote:
> > > Page tables do not use the reference count.  That means we can avoid
> 
> While looking into the second patch, I came across this one instance in
> sparc32. Commit 1996d47a0db ("sparc32: mm: Only call ctor()/dtor()
> functions for first and last user") decided to start using the refcount
> for something.
> 
> I'm not certain this code should even be using ptdescs, but theres a
> page_ptdesc() call in there for now :(.
> 
> As far as I can tell, this is only a thing for sparc32. Cc-ing Will as
> the author of those commits.

I'm afraid I've forgotten most of this as the only reason I found myself
hacking on sparc32 back in 2020 was because I was reworking READ_ONCE()
to operate only on scalar types and the srmmu code needed some surgery
to make that work.

However, from what I can tell/recall, pte_alloc_one() ends up calling
srmmu_get_nocache() which uses a custom allocator backed by a memblock
allocation from memblock_alloc_or_panic(). See srmmu_nocache_init().

Will


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/4] mm: Use frozen pages for page tables
  2025-11-13 14:04 ` [PATCH 1/4] mm: Use frozen pages for page tables Matthew Wilcox (Oracle)
  2025-11-13 18:24   ` Vishal Moola (Oracle)
@ 2025-11-17 14:38   ` kernel test robot
  2025-11-18  0:44     ` Vishal Moola (Oracle)
  2025-11-19 15:46   ` Chih-En Lin
  2 siblings, 1 reply; 16+ messages in thread
From: kernel test robot @ 2025-11-17 14:38 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: oe-lkp, lkp, linux-mm, Andrew Morton, Matthew Wilcox (Oracle),
	David Hildenbrand, Vishal Moola, oliver.sang



Hello,

kernel test robot noticed "BUG:Bad_page_state_in_process" on:

commit: ffb870b766822062b6c71211c80342c85a7ffcd8 ("[PATCH 1/4] mm: Use frozen pages for page tables")
url: https://github.com/intel-lab-lkp/linux/commits/Matthew-Wilcox-Oracle/mm-Use-frozen-pages-for-page-tables/20251113-222907
base: https://git.kernel.org/cgit/linux/kernel/git/akpm/mm.git mm-everything
patch link: https://lore.kernel.org/all/20251113140448.1814860-2-willy@infradead.org/
patch subject: [PATCH 1/4] mm: Use frozen pages for page tables

in testcase: rcutorture
version: 
with following parameters:

	runtime: 300s
	test: cpuhotplug
	torture_type: trivial



config: x86_64-randconfig-101-20251114
compiler: clang-20
test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 32G

(please refer to attached dmesg/kmsg for entire log/backtrace)



If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202511172257.ffd96dab-lkp@intel.com


[   19.289760][  T422] BUG: Bad page state in process modprobe  pfn:1618b2
[   19.290414][  T422] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1618b2
[   19.291313][  T422] flags: 0x8000000000000000(zone=2)
[   19.291714][  T422] raw: 8000000000000000 dead000000000100 dead000000000122 0000000000000000
[   19.292382][  T422] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[   19.293020][  T422] page dumped because: nonzero _refcount
[   19.293444][  T422] Modules linked in:
[   19.293804][  T422] CPU: 0 UID: 0 PID: 422 Comm: modprobe Not tainted 6.18.0-rc5-00422-gffb870b76682 #1 PREEMPT(none)  65c9d11eede624b36533d4efe2c3c7798fc76b60
[   19.293811][  T422] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[   19.293814][  T422] Call Trace:
[   19.293817][  T422]  <TASK>
[   19.293820][  T422]  dump_stack_lvl (lib/dump_stack.c:123)
[   19.293834][  T422]  ? show_regs_print_info (lib/dump_stack.c:104)
[   19.293842][  T422]  ? smp_call_function_many (kernel/smp.c:784)
[   19.293847][  T422]  ? find_held_lock (kernel/locking/lockdep.c:5350)
[   19.293854][  T422]  bad_page (mm/page_alloc.c:?)
[   19.293860][  T422]  __free_frozen_pages (mm/page_alloc.c:?)
[   19.293870][  T422]  change_page_attr_set_clr (include/linux/list.h:372)
[   19.293878][  T422]  ? __set_memory_prot (arch/x86/mm/pat/set_memory.c:2041)
[   19.293884][  T422]  ? __set_memory_prot (arch/x86/mm/pat/set_memory.c:2041)
[   19.293889][  T422]  ? trace_contention_end (include/trace/events/lock.h:122)
[   19.293897][  T422]  ? do_raw_spin_lock (arch/x86/include/asm/atomic.h:107)
[   19.293904][  T422]  set_memory_rox (arch/x86/mm/pat/set_memory.c:2327)
[   19.293910][  T422]  ? set_memory_nx (arch/x86/mm/pat/set_memory.c:2123 arch/x86/mm/pat/set_memory.c:2312)
[   19.293915][  T422]  ? set_memory_ro (arch/x86/mm/pat/set_memory.c:2321)
[   19.293921][  T422]  ? _raw_spin_unlock (arch/x86/include/asm/preempt.h:95 include/linux/spinlock_api_smp.h:143 kernel/locking/spinlock.c:186)
[   19.293929][  T422]  ? find_vmap_area (mm/vmalloc.c:2507)
[   19.293935][  T422]  module_enable_text_rox (kernel/module/strict_rwx.c:40)
[   19.293943][  T422]  complete_formation (kernel/module/main.c:3258)
[   19.293952][  T422]  ? post_relocation (kernel/module/main.c:3237)
[   19.293959][  T422]  ? init_build_id (kernel/module/kallsyms.c:?)
[   19.293967][  T422]  load_module (kernel/module/main.c:3468)
[   19.293979][  T422]  __se_sys_finit_module (kernel/module/main.c:? kernel/module/main.c:3713 kernel/module/main.c:3739 kernel/module/main.c:3723)
[   19.293987][  T422]  ? __x64_sys_finit_module (kernel/module/main.c:3723)
[   19.293998][  T422]  ? exc_page_fault (arch/x86/mm/fault.c:?)
[   19.294007][  T422]  ? __ia32_sys_write (fs/read_write.c:754)
[   19.294015][  T422]  ? do_sys_open (fs/open.c:1452)
[   19.294022][  T422]  ? entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
[   19.294026][  T422]  do_syscall_64 (arch/x86/entry/syscall_64.c:?)
[   19.294034][  T422]  entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
[   19.294038][  T422] RIP: 0033:0x7f8d36fda779
[   19.294042][  T422] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 4f 86 0d 00 f7 d8 64 89 01 48
All code
========
   0:	ff c3                	inc    %ebx
   2:	66 2e 0f 1f 84 00 00 	cs nopw 0x0(%rax,%rax,1)
   9:	00 00 00 
   c:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
  11:	48 89 f8             	mov    %rdi,%rax
  14:	48 89 f7             	mov    %rsi,%rdi
  17:	48 89 d6             	mov    %rdx,%rsi
  1a:	48 89 ca             	mov    %rcx,%rdx
  1d:	4d 89 c2             	mov    %r8,%r10
  20:	4d 89 c8             	mov    %r9,%r8
  23:	4c 8b 4c 24 08       	mov    0x8(%rsp),%r9
  28:	0f 05                	syscall
  2a:*	48 3d 01 f0 ff ff    	cmp    $0xfffffffffffff001,%rax		<-- trapping instruction
  30:	73 01                	jae    0x33
  32:	c3                   	ret
  33:	48 8b 0d 4f 86 0d 00 	mov    0xd864f(%rip),%rcx        # 0xd8689
  3a:	f7 d8                	neg    %eax
  3c:	64 89 01             	mov    %eax,%fs:(%rcx)
  3f:	48                   	rex.W

Code starting with the faulting instruction
===========================================
   0:	48 3d 01 f0 ff ff    	cmp    $0xfffffffffffff001,%rax
   6:	73 01                	jae    0x9
   8:	c3                   	ret
   9:	48 8b 0d 4f 86 0d 00 	mov    0xd864f(%rip),%rcx        # 0xd865f
  10:	f7 d8                	neg    %eax
  12:	64 89 01             	mov    %eax,%fs:(%rcx)
  15:	48                   	rex.W
[   19.294046][  T422] RSP: 002b:00007ffe07ac3298 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[   19.294051][  T422] RAX: ffffffffffffffda RBX: 000055b5fb23ae30 RCX: 00007f8d36fda779
[   19.294054][  T422] RDX: 0000000000000000 RSI: 000055b5e55e332b RDI: 0000000000000004
[   19.294056][  T422] RBP: 0000000000000000 R08: 0000000000000000 R09: 000055b5fb23c020
[   19.294059][  T422] R10: 0000000000000000 R11: 0000000000000246 R12: 000055b5e55e332b
[   19.294061][  T422] R13: 0000000000040000 R14: 000055b5fb23ade0 R15: 0000000000000000
[   19.294069][  T422]  </TASK>
[   19.294071][  T422] Disabling lock debugging due to kernel taint
[   19.373082][  T422] BUG: Bad page state in process modprobe  pfn:163532
[   19.373680][  T422] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x163532
[   19.374387][  T422] flags: 0x8000000000000000(zone=2)
[   19.374795][  T422] raw: 8000000000000000 dead000000000100 dead000000000122 0000000000000000
[   19.375424][  T422] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[   19.376107][  T422] page dumped because: nonzero _refcount
[   19.376525][  T422] Modules linked in: torture
[   19.376917][  T422] CPU: 0 UID: 0 PID: 422 Comm: modprobe Tainted: G    B               6.18.0-rc5-00422-gffb870b76682 #1 PREEMPT(none)  65c9d11eede624b36533d4efe2c3c7798fc76b60
[   19.376925][  T422] Tainted: [B]=BAD_PAGE
[   19.376927][  T422] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[   19.376930][  T422] Call Trace:
[   19.376933][  T422]  <TASK>
[   19.376936][  T422]  dump_stack_lvl (lib/dump_stack.c:123)
[   19.376946][  T422]  ? show_regs_print_info (lib/dump_stack.c:104)
[   19.376952][  T422]  ? smp_call_function_many (kernel/smp.c:784)
[   19.376959][  T422]  bad_page (mm/page_alloc.c:?)
[   19.376964][  T422]  __free_frozen_pages (mm/page_alloc.c:?)
[   19.376972][  T422]  change_page_attr_set_clr (include/linux/list.h:372)
[   19.376979][  T422]  ? __set_memory_prot (arch/x86/mm/pat/set_memory.c:2041)
[   19.376984][  T422]  ? __set_memory_prot (arch/x86/mm/pat/set_memory.c:2041)
[   19.376989][  T422]  ? trace_contention_end (include/trace/events/lock.h:122)
[   19.376995][  T422]  ? do_raw_spin_lock (arch/x86/include/asm/atomic.h:107)
[   19.377001][  T422]  set_memory_rox (arch/x86/mm/pat/set_memory.c:2327)
[   19.377006][  T422]  ? set_memory_nx (arch/x86/mm/pat/set_memory.c:2123 arch/x86/mm/pat/set_memory.c:2312)
[   19.377010][  T422]  ? set_memory_ro (arch/x86/mm/pat/set_memory.c:2321)
[   19.377016][  T422]  ? _raw_spin_unlock (arch/x86/include/asm/preempt.h:95 include/linux/spinlock_api_smp.h:143 kernel/locking/spinlock.c:186)
[   19.377023][  T422]  ? find_vmap_area (mm/vmalloc.c:2507)
[   19.377028][  T422]  module_enable_text_rox (kernel/module/strict_rwx.c:40)
[   19.377036][  T422]  complete_formation (kernel/module/main.c:3258)
[   19.377042][  T422]  ? __might_fault (mm/memory.c:7142)
[   19.377046][  T422]  ? post_relocation (kernel/module/main.c:3237)
[   19.377051][  T422]  ? __might_fault (mm/memory.c:7142)
[   19.377054][  T422]  ? init_build_id (kernel/module/kallsyms.c:?)
[   19.377061][  T422]  load_module (kernel/module/main.c:3468)
[   19.377069][  T422]  __se_sys_finit_module (kernel/module/main.c:? kernel/module/main.c:3713 kernel/module/main.c:3739 kernel/module/main.c:3723)
[   19.377074][  T422]  ? __x64_sys_finit_module (kernel/module/main.c:3723)
[   19.377081][  T422]  ? do_sys_openat2 (fs/open.c:1447)
[   19.377089][  T422]  ? __ia32_sys_write (fs/read_write.c:754)
[   19.377095][  T422]  ? do_sys_open (fs/open.c:1452)
[   19.377100][  T422]  ? entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
[   19.377104][  T422]  do_syscall_64 (arch/x86/entry/syscall_64.c:?)
[   19.377111][  T422]  entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
[   19.377115][  T422] RIP: 0033:0x7f8d36fda779
[   19.377120][  T422] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 4f 86 0d 00 f7 d8 64 89 01 48
All code
========
   0:	ff c3                	inc    %ebx
   2:	66 2e 0f 1f 84 00 00 	cs nopw 0x0(%rax,%rax,1)
   9:	00 00 00 
   c:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
  11:	48 89 f8             	mov    %rdi,%rax
  14:	48 89 f7             	mov    %rsi,%rdi
  17:	48 89 d6             	mov    %rdx,%rsi
  1a:	48 89 ca             	mov    %rcx,%rdx
  1d:	4d 89 c2             	mov    %r8,%r10
  20:	4d 89 c8             	mov    %r9,%r8
  23:	4c 8b 4c 24 08       	mov    0x8(%rsp),%r9
  28:	0f 05                	syscall
  2a:*	48 3d 01 f0 ff ff    	cmp    $0xfffffffffffff001,%rax		<-- trapping instruction
  30:	73 01                	jae    0x33
  32:	c3                   	ret
  33:	48 8b 0d 4f 86 0d 00 	mov    0xd864f(%rip),%rcx        # 0xd8689
  3a:	f7 d8                	neg    %eax
  3c:	64 89 01             	mov    %eax,%fs:(%rcx)
  3f:	48                   	rex.W

Code starting with the faulting instruction
===========================================
   0:	48 3d 01 f0 ff ff    	cmp    $0xfffffffffffff001,%rax
   6:	73 01                	jae    0x9
   8:	c3                   	ret
   9:	48 8b 0d 4f 86 0d 00 	mov    0xd864f(%rip),%rcx        # 0xd865f
  10:	f7 d8                	neg    %eax
  12:	64 89 01             	mov    %eax,%fs:(%rcx)
  15:	48                   	rex.W
[   19.377123][  T422] RSP: 002b:00007ffe07ac3298 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[   19.377128][  T422] RAX: ffffffffffffffda RBX: 000055b5fb23ac50 RCX: 00007f8d36fda779
[   19.377131][  T422] RDX: 0000000000000000 RSI: 000055b5fb23aff0 RDI: 0000000000000005
[   19.377134][  T422] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[   19.377136][  T422] R10: 0000000000000000 R11: 0000000000000246 R12: 000055b5fb23aff0
[   19.377139][  T422] R13: 0000000000040000 R14: 000055b5fb23ad80 R15: 0000000000000000
[   19.377143][  T422]  </TASK>


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20251117/202511172257.ffd96dab-lkp@intel.com



-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/4] mm: Use frozen pages for page tables
  2025-11-17 14:38   ` kernel test robot
@ 2025-11-18  0:44     ` Vishal Moola (Oracle)
  0 siblings, 0 replies; 16+ messages in thread
From: Vishal Moola (Oracle) @ 2025-11-18  0:44 UTC (permalink / raw)
  To: kernel test robot
  Cc: Matthew Wilcox (Oracle),
	oe-lkp, lkp, linux-mm, Andrew Morton, David Hildenbrand

On Mon, Nov 17, 2025 at 10:38:09PM +0800, kernel test robot wrote:
> 
> 
> Hello,
> 
> kernel test robot noticed "BUG:Bad_page_state_in_process" on:
> 
> commit: ffb870b766822062b6c71211c80342c85a7ffcd8 ("[PATCH 1/4] mm: Use frozen pages for page tables")
> url: https://github.com/intel-lab-lkp/linux/commits/Matthew-Wilcox-Oracle/mm-Use-frozen-pages-for-page-tables/20251113-222907
> base: https://git.kernel.org/cgit/linux/kernel/git/akpm/mm.git mm-everything
> patch link: https://lore.kernel.org/all/20251113140448.1814860-2-willy@infradead.org/
> patch subject: [PATCH 1/4] mm: Use frozen pages for page tables
> 
> in testcase: rcutorture
> version: 
> with following parameters:
> 
> 	runtime: 300s
> 	test: cpuhotplug
> 	torture_type: trivial
> 
> 
> 
> config: x86_64-randconfig-101-20251114
> compiler: clang-20
> test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 32G
> 
> (please refer to attached dmesg/kmsg for entire log/backtrace)
> 
> 
> 
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <oliver.sang@intel.com>
> | Closes: https://lore.kernel.org/oe-lkp/202511172257.ffd96dab-lkp@intel.com
> 
> 
> [   19.289760][  T422] BUG: Bad page state in process modprobe  pfn:1618b2
> [   19.290414][  T422] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1618b2
> [   19.291313][  T422] flags: 0x8000000000000000(zone=2)
> [   19.291714][  T422] raw: 8000000000000000 dead000000000100 dead000000000122 0000000000000000
> [   19.292382][  T422] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
> [   19.293020][  T422] page dumped because: nonzero _refcount
> [   19.293444][  T422] Modules linked in:
> [   19.293804][  T422] CPU: 0 UID: 0 PID: 422 Comm: modprobe Not tainted 6.18.0-rc5-00422-gffb870b76682 #1 PREEMPT(none)  65c9d11eede624b36533d4efe2c3c7798fc76b60
> [   19.293811][  T422] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
> [   19.293814][  T422] Call Trace:
> [   19.293817][  T422]  <TASK>
> [   19.293820][  T422]  dump_stack_lvl (lib/dump_stack.c:123)
> [   19.293834][  T422]  ? show_regs_print_info (lib/dump_stack.c:104)
> [   19.293842][  T422]  ? smp_call_function_many (kernel/smp.c:784)
> [   19.293847][  T422]  ? find_held_lock (kernel/locking/lockdep.c:5350)
> [   19.293854][  T422]  bad_page (mm/page_alloc.c:?)
> [   19.293860][  T422]  __free_frozen_pages (mm/page_alloc.c:?)
> [   19.293870][  T422]  change_page_attr_set_clr (include/linux/list.h:372)
> [   19.293878][  T422]  ? __set_memory_prot (arch/x86/mm/pat/set_memory.c:2041)
> [   19.293884][  T422]  ? __set_memory_prot (arch/x86/mm/pat/set_memory.c:2041)
> [   19.293889][  T422]  ? trace_contention_end (include/trace/events/lock.h:122)
> [   19.293897][  T422]  ? do_raw_spin_lock (arch/x86/include/asm/atomic.h:107)
> [   19.293904][  T422]  set_memory_rox (arch/x86/mm/pat/set_memory.c:2327)
> [   19.293910][  T422]  ? set_memory_nx (arch/x86/mm/pat/set_memory.c:2123 arch/x86/mm/pat/set_memory.c:2312)
> [   19.293915][  T422]  ? set_memory_ro (arch/x86/mm/pat/set_memory.c:2321)
> [   19.293921][  T422]  ? _raw_spin_unlock (arch/x86/include/asm/preempt.h:95 include/linux/spinlock_api_smp.h:143 kernel/locking/spinlock.c:186)
> [   19.293929][  T422]  ? find_vmap_area (mm/vmalloc.c:2507)
> [   19.293935][  T422]  module_enable_text_rox (kernel/module/strict_rwx.c:40)
> [   19.293943][  T422]  complete_formation (kernel/module/main.c:3258)
> [   19.293952][  T422]  ? post_relocation (kernel/module/main.c:3237)
> [   19.293959][  T422]  ? init_build_id (kernel/module/kallsyms.c:?)
> [   19.293967][  T422]  load_module (kernel/module/main.c:3468)
> [   19.293979][  T422]  __se_sys_finit_module (kernel/module/main.c:? kernel/module/main.c:3713 kernel/module/main.c:3739 kernel/module/main.c:3723)
> [   19.293987][  T422]  ? __x64_sys_finit_module (kernel/module/main.c:3723)
> [   19.293998][  T422]  ? exc_page_fault (arch/x86/mm/fault.c:?)
> [   19.294007][  T422]  ? __ia32_sys_write (fs/read_write.c:754)
> [   19.294015][  T422]  ? do_sys_open (fs/open.c:1452)
> [   19.294022][  T422]  ? entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
> [   19.294026][  T422]  do_syscall_64 (arch/x86/entry/syscall_64.c:?)
> [   19.294034][  T422]  entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
> [   19.294038][  T422] RIP: 0033:0x7f8d36fda779
> [   19.294042][  T422] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 4f 86 0d 00 f7 d8 64 89 01 48
> All code
> ========
>    0:	ff c3                	inc    %ebx
>    2:	66 2e 0f 1f 84 00 00 	cs nopw 0x0(%rax,%rax,1)
>    9:	00 00 00 
>    c:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
>   11:	48 89 f8             	mov    %rdi,%rax
>   14:	48 89 f7             	mov    %rsi,%rdi
>   17:	48 89 d6             	mov    %rdx,%rsi
>   1a:	48 89 ca             	mov    %rcx,%rdx
>   1d:	4d 89 c2             	mov    %r8,%r10
>   20:	4d 89 c8             	mov    %r9,%r8
>   23:	4c 8b 4c 24 08       	mov    0x8(%rsp),%r9
>   28:	0f 05                	syscall
>   2a:*	48 3d 01 f0 ff ff    	cmp    $0xfffffffffffff001,%rax		<-- trapping instruction
>   30:	73 01                	jae    0x33
>   32:	c3                   	ret
>   33:	48 8b 0d 4f 86 0d 00 	mov    0xd864f(%rip),%rcx        # 0xd8689
>   3a:	f7 d8                	neg    %eax
>   3c:	64 89 01             	mov    %eax,%fs:(%rcx)
>   3f:	48                   	rex.W
> 
> Code starting with the faulting instruction
> ===========================================
>    0:	48 3d 01 f0 ff ff    	cmp    $0xfffffffffffff001,%rax
>    6:	73 01                	jae    0x9
>    8:	c3                   	ret
>    9:	48 8b 0d 4f 86 0d 00 	mov    0xd864f(%rip),%rcx        # 0xd865f
>   10:	f7 d8                	neg    %eax
>   12:	64 89 01             	mov    %eax,%fs:(%rcx)
>   15:	48                   	rex.W
> [   19.294046][  T422] RSP: 002b:00007ffe07ac3298 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
> [   19.294051][  T422] RAX: ffffffffffffffda RBX: 000055b5fb23ae30 RCX: 00007f8d36fda779
> [   19.294054][  T422] RDX: 0000000000000000 RSI: 000055b5e55e332b RDI: 0000000000000004
> [   19.294056][  T422] RBP: 0000000000000000 R08: 0000000000000000 R09: 000055b5fb23c020
> [   19.294059][  T422] R10: 0000000000000000 R11: 0000000000000246 R12: 000055b5e55e332b
> [   19.294061][  T422] R13: 0000000000040000 R14: 000055b5fb23ade0 R15: 0000000000000000
> [   19.294069][  T422]  </TASK>
> [   19.294071][  T422] Disabling lock debugging due to kernel taint
> [   19.373082][  T422] BUG: Bad page state in process modprobe  pfn:163532
> [   19.373680][  T422] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x163532
> [   19.374387][  T422] flags: 0x8000000000000000(zone=2)
> [   19.374795][  T422] raw: 8000000000000000 dead000000000100 dead000000000122 0000000000000000
> [   19.375424][  T422] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
> [   19.376107][  T422] page dumped because: nonzero _refcount
> [   19.376525][  T422] Modules linked in: torture
> [   19.376917][  T422] CPU: 0 UID: 0 PID: 422 Comm: modprobe Tainted: G    B               6.18.0-rc5-00422-gffb870b76682 #1 PREEMPT(none)  65c9d11eede624b36533d4efe2c3c7798fc76b60
> [   19.376925][  T422] Tainted: [B]=BAD_PAGE
> [   19.376927][  T422] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
> [   19.376930][  T422] Call Trace:
> [   19.376933][  T422]  <TASK>
> [   19.376936][  T422]  dump_stack_lvl (lib/dump_stack.c:123)
> [   19.376946][  T422]  ? show_regs_print_info (lib/dump_stack.c:104)
> [   19.376952][  T422]  ? smp_call_function_many (kernel/smp.c:784)
> [   19.376959][  T422]  bad_page (mm/page_alloc.c:?)
> [   19.376964][  T422]  __free_frozen_pages (mm/page_alloc.c:?)
> [   19.376972][  T422]  change_page_attr_set_clr (include/linux/list.h:372)
> [   19.376979][  T422]  ? __set_memory_prot (arch/x86/mm/pat/set_memory.c:2041)
> [   19.376984][  T422]  ? __set_memory_prot (arch/x86/mm/pat/set_memory.c:2041)
> [   19.376989][  T422]  ? trace_contention_end (include/trace/events/lock.h:122)
> [   19.376995][  T422]  ? do_raw_spin_lock (arch/x86/include/asm/atomic.h:107)
> [   19.377001][  T422]  set_memory_rox (arch/x86/mm/pat/set_memory.c:2327)
> [   19.377006][  T422]  ? set_memory_nx (arch/x86/mm/pat/set_memory.c:2123 arch/x86/mm/pat/set_memory.c:2312)
> [   19.377010][  T422]  ? set_memory_ro (arch/x86/mm/pat/set_memory.c:2321)
> [   19.377016][  T422]  ? _raw_spin_unlock (arch/x86/include/asm/preempt.h:95 include/linux/spinlock_api_smp.h:143 kernel/locking/spinlock.c:186)
> [   19.377023][  T422]  ? find_vmap_area (mm/vmalloc.c:2507)
> [   19.377028][  T422]  module_enable_text_rox (kernel/module/strict_rwx.c:40)
> [   19.377036][  T422]  complete_formation (kernel/module/main.c:3258)
> [   19.377042][  T422]  ? __might_fault (mm/memory.c:7142)
> [   19.377046][  T422]  ? post_relocation (kernel/module/main.c:3237)
> [   19.377051][  T422]  ? __might_fault (mm/memory.c:7142)
> [   19.377054][  T422]  ? init_build_id (kernel/module/kallsyms.c:?)
> [   19.377061][  T422]  load_module (kernel/module/main.c:3468)
> [   19.377069][  T422]  __se_sys_finit_module (kernel/module/main.c:? kernel/module/main.c:3713 kernel/module/main.c:3739 kernel/module/main.c:3723)
> [   19.377074][  T422]  ? __x64_sys_finit_module (kernel/module/main.c:3723)
> [   19.377081][  T422]  ? do_sys_openat2 (fs/open.c:1447)
> [   19.377089][  T422]  ? __ia32_sys_write (fs/read_write.c:754)
> [   19.377095][  T422]  ? do_sys_open (fs/open.c:1452)
> [   19.377100][  T422]  ? entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
> [   19.377104][  T422]  do_syscall_64 (arch/x86/entry/syscall_64.c:?)
> [   19.377111][  T422]  entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
> [   19.377115][  T422] RIP: 0033:0x7f8d36fda779
> [   19.377120][  T422] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 4f 86 0d 00 f7 d8 64 89 01 48

This is not a problem with this patch. It's actually a symptom of commit
bf9e4e30f3538 ("x86/mm: use pagetable_free()"). We're freeing ptdescs
that haven't been allocated from the ptdesc allocator - aka
pagetable_alloc().


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/4] mm: Mark pagetable memory when allocated
  2025-11-13 14:04 ` [PATCH 3/4] mm: Mark " Matthew Wilcox (Oracle)
@ 2025-11-18 17:00   ` David Hildenbrand (Red Hat)
  0 siblings, 0 replies; 16+ messages in thread
From: David Hildenbrand (Red Hat) @ 2025-11-18 17:00 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), Andrew Morton; +Cc: Vishal Moola, linux-mm

On 13.11.25 15:04, Matthew Wilcox (Oracle) wrote:
> Move the page type setting from the constructor to the allocation site.
> Some of the architecture code is a little complex to reason about, but
> I think this is all correct. 

IIUC, __ClearPageTable(page) and __SetPageTable(page) already check that 
no other/the expected type is set.

SO I guess we'd find out rather soon whether we are missing something :)

Moving it to alloc+free makes perfect sense to me.

-- 
Cheers

David


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/4] mm: Use frozen pages for page tables
  2025-11-13 14:04 ` [PATCH 1/4] mm: Use frozen pages for page tables Matthew Wilcox (Oracle)
  2025-11-13 18:24   ` Vishal Moola (Oracle)
  2025-11-17 14:38   ` kernel test robot
@ 2025-11-19 15:46   ` Chih-En Lin
  2025-11-20 13:55     ` David Hildenbrand (Red Hat)
  2 siblings, 1 reply; 16+ messages in thread
From: Chih-En Lin @ 2025-11-19 15:46 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: Andrew Morton, David Hildenbrand, Vishal Moola, linux-mm

On Thu, Nov 13, 2025 at 02:04:43PM +0000, Matthew Wilcox (Oracle) wrote:
> Page tables do not use the reference count.  That means we can avoid
> two atomic operations (one on alloc, one on free) by allocating frozen
> pages here.  This does not interfere with compaction as page tables are
> non-movable allocations.
> 
> pagetable_alloc() and pagetable_free() need to move out of line to make
> this work as alloc_frozen_page() and free_frozen_page() are not exported
> outside the mm for now.  We'll want them out of line anyway soon.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  include/linux/mm.h   | 53 +++++---------------------------------------
>  mm/memory.c          | 34 ++++++++++++++++++++++++++++
>  mm/pgtable-generic.c |  3 ++-
>  3 files changed, 42 insertions(+), 48 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 5087deecdd9c..e168ee23091e 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2995,58 +2995,17 @@ static inline void ptdesc_clear_kernel(struct ptdesc *ptdesc)
>   */
>  static inline bool ptdesc_test_kernel(const struct ptdesc *ptdesc)
>  {
> +#ifdef CONFIG_ASYNC_KERNEL_PGTABLE_FREE
>  	return test_bit(PT_kernel, &ptdesc->pt_flags.f);
> +#else
> +	return false;
> +#endif
>  }
>  
> -/**
> - * pagetable_alloc - Allocate pagetables
> - * @gfp:    GFP flags
> - * @order:  desired pagetable order
> - *
> - * pagetable_alloc allocates memory for page tables as well as a page table
> - * descriptor to describe that memory.
> - *
> - * Return: The ptdesc describing the allocated page tables.
> - */
> -static inline struct ptdesc *pagetable_alloc_noprof(gfp_t gfp, unsigned int order)
> -{
> -	struct page *page = alloc_pages_noprof(gfp | __GFP_COMP, order);
> -
> -	return page_ptdesc(page);
> -}
> +struct ptdesc *pagetable_alloc_noprof(gfp_t gfp, unsigned int order);
>  #define pagetable_alloc(...)	alloc_hooks(pagetable_alloc_noprof(__VA_ARGS__))
> -
> -static inline void __pagetable_free(struct ptdesc *pt)
> -{
> -	struct page *page = ptdesc_page(pt);
> -
> -	__free_pages(page, compound_order(page));
> -}
> -
> -#ifdef CONFIG_ASYNC_KERNEL_PGTABLE_FREE
> +void pagetable_free(struct ptdesc *pt);
>  void pagetable_free_kernel(struct ptdesc *pt);
> -#else
> -static inline void pagetable_free_kernel(struct ptdesc *pt)
> -{
> -	__pagetable_free(pt);
> -}
> -#endif
> -/**
> - * pagetable_free - Free pagetables
> - * @pt:	The page table descriptor
> - *
> - * pagetable_free frees the memory of all page tables described by a page
> - * table descriptor and the memory for the descriptor itself.
> - */
> -static inline void pagetable_free(struct ptdesc *pt)
> -{
> -	if (ptdesc_test_kernel(pt)) {
> -		ptdesc_clear_kernel(pt);
> -		pagetable_free_kernel(pt);
> -	} else {
> -		__pagetable_free(pt);
> -	}
> -}
>  
>  #if defined(CONFIG_SPLIT_PTE_PTLOCKS)
>  #if ALLOC_SPLIT_PTLOCKS
> diff --git a/mm/memory.c b/mm/memory.c
> index 1c66ee83a7ab..781cd7f607f7 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -7338,6 +7338,40 @@ long copy_folio_from_user(struct folio *dst_folio,
>  }
>  #endif /* CONFIG_TRANSPARENT_HUGEPAGE || CONFIG_HUGETLBFS */
>  
> +/**
> + * pagetable_alloc - Allocate pagetables
> + * @gfp:    GFP flags
> + * @order:  desired pagetable order
> + *
> + * pagetable_alloc allocates memory for page tables as well as a page table
> + * descriptor to describe that memory.
> + *
> + * Return: The ptdesc describing the allocated page tables.
> + */
> +struct ptdesc *pagetable_alloc_noprof(gfp_t gfp, unsigned int order)
> +{
> +	struct page *page = alloc_frozen_pages_noprof(gfp | __GFP_COMP, order);
> +
> +	return page_ptdesc(page);
> +}
> +
> +/**
> + * pagetable_free - Free pagetables
> + * @pt:	The page table descriptor
> + *
> + * pagetable_free frees the memory of all page tables described by a page
> + * table descriptor and the memory for the descriptor itself.
> + */
> +void pagetable_free(struct ptdesc *pt)
> +{
> +	struct page *page = ptdesc_page(pt);
> +
> +	if (ptdesc_test_kernel(pt))
> +		pagetable_free_kernel(pt);

Should we use test_and_clear_bit() here to prevent the double free?
Or it is unnecessary because the caller will guarantee there is no other
thread that will free the same pagetables.

> +	else
> +		free_frozen_pages(page, compound_order(page));
> +}
> +
>  #if defined(CONFIG_SPLIT_PTE_PTLOCKS) && ALLOC_SPLIT_PTLOCKS
>  
>  static struct kmem_cache *page_ptl_cachep;
> diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c
> index d3aec7a9926a..597049e21ac1 100644
> --- a/mm/pgtable-generic.c
> +++ b/mm/pgtable-generic.c
> @@ -434,11 +434,12 @@ static void kernel_pgtable_work_func(struct work_struct *work)
>  
>  	iommu_sva_invalidate_kva_range(PAGE_OFFSET, TLB_FLUSH_ALL);
>  	list_for_each_entry_safe(pt, next, &page_list, pt_list)
> -		__pagetable_free(pt);
> +		pagetable_free(pt);
>  }
>  
>  void pagetable_free_kernel(struct ptdesc *pt)
>  {
> +	ptdesc_clear_kernel(pt);
>  	spin_lock(&kernel_pgtable_work.lock);
>  	list_add(&pt->pt_list, &kernel_pgtable_work.list);
>  	spin_unlock(&kernel_pgtable_work.lock);
> -- 
> 2.47.2
> 
>

Thanks,
Chih-En Lin


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/4] mm: Use frozen pages for page tables
  2025-11-19 15:46   ` Chih-En Lin
@ 2025-11-20 13:55     ` David Hildenbrand (Red Hat)
  0 siblings, 0 replies; 16+ messages in thread
From: David Hildenbrand (Red Hat) @ 2025-11-20 13:55 UTC (permalink / raw)
  To: Chih-En Lin, Matthew Wilcox (Oracle)
  Cc: Andrew Morton, Vishal Moola, linux-mm


>> +{
>> +	struct page *page = ptdesc_page(pt);
>> +
>> +	if (ptdesc_test_kernel(pt))
>> +		pagetable_free_kernel(pt);
> 
> Should we use test_and_clear_bit() here to prevent the double free?
> Or it is unnecessary because the caller will guarantee there is no other
> thread that will free the same pagetables.

If that would be the case, it would be horribly broken already?

-- 
Cheers

David


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/4] pgtable: Remove uses of page->lru
  2025-11-13 14:04 ` [PATCH 4/4] pgtable: Remove uses of page->lru Matthew Wilcox (Oracle)
@ 2025-11-20 13:56   ` David Hildenbrand (Red Hat)
  0 siblings, 0 replies; 16+ messages in thread
From: David Hildenbrand (Red Hat) @ 2025-11-20 13:56 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), Andrew Morton; +Cc: Vishal Moola, linux-mm

On 11/13/25 15:04, Matthew Wilcox (Oracle) wrote:
> Use ptdesc->pt_list instead of page->lru.  These are the same bits for
> now, but will be different when ptdesc is allocated separately.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---


LGTM

Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>

-- 
Cheers

David


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2025-11-20 13:57 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-11-13 14:04 [PATCH 0/4] Convert pgtable to use frozen pages Matthew Wilcox (Oracle)
2025-11-13 14:04 ` [PATCH 1/4] mm: Use frozen pages for page tables Matthew Wilcox (Oracle)
2025-11-13 18:24   ` Vishal Moola (Oracle)
2025-11-13 19:14     ` Vishal Moola (Oracle)
2025-11-14 13:45       ` Matthew Wilcox
2025-11-14 14:31       ` Will Deacon
2025-11-17 14:38   ` kernel test robot
2025-11-18  0:44     ` Vishal Moola (Oracle)
2025-11-19 15:46   ` Chih-En Lin
2025-11-20 13:55     ` David Hildenbrand (Red Hat)
2025-11-13 14:04 ` [PATCH 2/4] mm: Account pagetable memory when allocated Matthew Wilcox (Oracle)
2025-11-13 19:39   ` Vishal Moola (Oracle)
2025-11-13 14:04 ` [PATCH 3/4] mm: Mark " Matthew Wilcox (Oracle)
2025-11-18 17:00   ` David Hildenbrand (Red Hat)
2025-11-13 14:04 ` [PATCH 4/4] pgtable: Remove uses of page->lru Matthew Wilcox (Oracle)
2025-11-20 13:56   ` David Hildenbrand (Red Hat)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox