* [PATCH v5 0/4] Convert 64-bit x86/mm/pat to ptdescs
@ 2026-02-11 19:52 Vishal Moola (Oracle)
2026-02-11 19:52 ` [PATCH v5 1/4] mm: Add address apis for ptdescs Vishal Moola (Oracle)
` (3 more replies)
0 siblings, 4 replies; 18+ messages in thread
From: Vishal Moola (Oracle) @ 2026-02-11 19:52 UTC (permalink / raw)
To: linux-kernel, linux-mm, x86, Mike Rapoport (Microsoft)
Cc: akpm, Matthew Wilcox (Oracle),
Dave Hansen, Andy Lutomirski, Peter Zijlstra,
Vishal Moola (Oracle)
x86/mm/pat should be using ptdescs. One line has already been
converted to pagetable_free(), while the allocation sites use
get_free_pages(). This causes issues separately allocating ptdescs
from struct page.
The first patch introduces new ptdesc apis that operate on addresses.
These are like get_free_pages() and free_pages() helper functions.
The remaining patches convert the allocation/free sites to use ptdescs. In
the short term, this helps enable Matthew's work to allocate frozen
pagetables[1]. And in the long term, this will help us cleanly split
ptdesc allocations from struct page[2].
The pgd_list should also be using ptdescs (for 32bit in this file). This
can be done in a different patchset since there's other users of pgd_list
that still need to be converted.
[1] https://lore.kernel.org/linux-mm/20251113140448.1814860-1-willy@infradead.org/
[2] https://lore.kernel.org/linux-mm/20251020001652.2116669-1-willy@infradead.org/
------
Based on current mm-new.
v5:
- Return a void pointer instead of unsigned long in allocation
- More imperative voice in commit logs
Vishal Moola (Oracle) (4):
mm: Add address apis for ptdescs
x86/mm/pat: Convert pte code to use ptdescs
x86/mm/pat: Convert pmd code to use ptdescs
x86/mm/pat: Convert split_large_page() to use ptdescs
arch/x86/mm/pat/set_memory.c | 49 ++++++++++++++++++------------------
include/linux/mm.h | 4 +++
mm/memory.c | 34 +++++++++++++++++++++++++
3 files changed, 63 insertions(+), 24 deletions(-)
--
2.52.0
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v5 1/4] mm: Add address apis for ptdescs
2026-02-11 19:52 [PATCH v5 0/4] Convert 64-bit x86/mm/pat to ptdescs Vishal Moola (Oracle)
@ 2026-02-11 19:52 ` Vishal Moola (Oracle)
2026-02-11 20:13 ` Dave Hansen
2026-02-11 19:52 ` [PATCH v5 2/4] x86/mm/pat: Convert pte code to use ptdescs Vishal Moola (Oracle)
` (2 subsequent siblings)
3 siblings, 1 reply; 18+ messages in thread
From: Vishal Moola (Oracle) @ 2026-02-11 19:52 UTC (permalink / raw)
To: linux-kernel, linux-mm, x86, Mike Rapoport (Microsoft)
Cc: akpm, Matthew Wilcox (Oracle),
Dave Hansen, Andy Lutomirski, Peter Zijlstra,
Vishal Moola (Oracle),
Dave Hansen
Architectures frequently only care about the address associated with a
page table. The current ptdesc api forced callers to acquire a ptdesc to
use them. Add more apis to abstract ptdescs away from architectures that
don't need the descriptor.
Add pgtable_alloc_addr() and pgtable_free_addr() to operate on the
underlying addresses associated with page table descriptors, similar to
get_free_pages() and free_pages(). Zero the allocations since
theres no reason to want a page table with stale data.
Have pgtable_alloc_addr() return a void pointer. This will simplify code
for callers since they all want pointers.
Suggested-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
---
include/linux/mm.h | 4 ++++
mm/memory.c | 34 ++++++++++++++++++++++++++++++++++
2 files changed, 38 insertions(+)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index f8a8fd47399c..9b6d3d910990 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3419,6 +3419,10 @@ static inline void __pagetable_free(struct ptdesc *pt)
__free_pages(page, compound_order(page));
}
+void *pgtable_alloc_addr_noprof(gfp_t gfp, unsigned int order);
+#define pgtable_alloc_addr(...) alloc_hooks(pgtable_alloc_addr_noprof(__VA_ARGS__))
+void pgtable_free_addr(const void *addr);
+
#ifdef CONFIG_ASYNC_KERNEL_PGTABLE_FREE
void pagetable_free_kernel(struct ptdesc *pt);
#else
diff --git a/mm/memory.c b/mm/memory.c
index 1a26947ed8cd..b9653377d647 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -7452,6 +7452,40 @@ long copy_folio_from_user(struct folio *dst_folio,
}
#endif /* CONFIG_TRANSPARENT_HUGEPAGE || CONFIG_HUGETLBFS */
+/**
+ * pgtable_alloc_addr - Allocate pagetables to get an address
+ * @gfp: GFP flags
+ * @order: desired pagetable order
+ *
+ * pgtable_alloc_addr is like pagetable_alloc. This is for callers who only want a
+ * page table's address, not its ptdesc.
+ *
+ * Return: The address associated with the allocated page table, or 0 on
+ * failure.
+ */
+void *pgtable_alloc_addr_noprof(gfp_t gfp, unsigned int order)
+{
+ struct ptdesc *ptdesc = pagetable_alloc_noprof(gfp | __GFP_ZERO, order);
+
+ if (!ptdesc)
+ return 0;
+ return ptdesc_address(ptdesc);
+}
+
+/**
+ * pgtable_free_addr - Free pagetables by address
+ * @addr: The virtual address from pgtable_alloc()
+ *
+ * This function is for callers who have the address but no ptdesc. If you
+ * have the ptdesc, use pagetable_free() instead.
+ */
+void pgtable_free_addr(const void *addr)
+{
+ struct ptdesc *ptdesc = virt_to_ptdesc(addr);
+
+ pagetable_free(ptdesc);
+}
+
#if defined(CONFIG_SPLIT_PTE_PTLOCKS) && ALLOC_SPLIT_PTLOCKS
static struct kmem_cache *page_ptl_cachep;
--
2.52.0
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v5 2/4] x86/mm/pat: Convert pte code to use ptdescs
2026-02-11 19:52 [PATCH v5 0/4] Convert 64-bit x86/mm/pat to ptdescs Vishal Moola (Oracle)
2026-02-11 19:52 ` [PATCH v5 1/4] mm: Add address apis for ptdescs Vishal Moola (Oracle)
@ 2026-02-11 19:52 ` Vishal Moola (Oracle)
2026-02-11 21:55 ` Matthew Wilcox
2026-02-11 19:52 ` [PATCH v5 3/4] x86/mm/pat: Convert pmd " Vishal Moola (Oracle)
2026-02-11 19:52 ` [PATCH v5 4/4] x86/mm/pat: Convert split_large_page() " Vishal Moola (Oracle)
3 siblings, 1 reply; 18+ messages in thread
From: Vishal Moola (Oracle) @ 2026-02-11 19:52 UTC (permalink / raw)
To: linux-kernel, linux-mm, x86, Mike Rapoport (Microsoft)
Cc: akpm, Matthew Wilcox (Oracle),
Dave Hansen, Andy Lutomirski, Peter Zijlstra,
Vishal Moola (Oracle)
We need all allocation and free sites to use the ptdesc APIs in order to
allocate them separately from regular pages. Convert these pte
allocation/free sites to use ptdescs.
Also, rename *_pte_page() functions to *_pte(). Rename them now to avoid
any confusion later. Eventually these allocations will be backed by a
ptdesc not a page, but that's not important to callers either.
Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
---
arch/x86/mm/pat/set_memory.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index 6c6eb486f7a6..04eae65aedfc 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -1400,7 +1400,7 @@ static int collapse_large_pages(unsigned long addr, struct list_head *pgtables)
return collapsed;
}
-static bool try_to_free_pte_page(pte_t *pte)
+static bool try_to_free_pte(pte_t *pte)
{
int i;
@@ -1408,7 +1408,7 @@ static bool try_to_free_pte_page(pte_t *pte)
if (!pte_none(pte[i]))
return false;
- free_page((unsigned long)pte);
+ pgtable_free_addr(pte);
return true;
}
@@ -1435,7 +1435,7 @@ static bool unmap_pte_range(pmd_t *pmd, unsigned long start, unsigned long end)
pte++;
}
- if (try_to_free_pte_page((pte_t *)pmd_page_vaddr(*pmd))) {
+ if (try_to_free_pte((pte_t *)pmd_page_vaddr(*pmd))) {
pmd_clear(pmd);
return true;
}
@@ -1537,9 +1537,9 @@ static void unmap_pud_range(p4d_t *p4d, unsigned long start, unsigned long end)
*/
}
-static int alloc_pte_page(pmd_t *pmd)
+static int alloc_pte(pmd_t *pmd)
{
- pte_t *pte = (pte_t *)get_zeroed_page(GFP_KERNEL);
+ pte_t *pte = pgtable_alloc_addr(GFP_KERNEL, 0);
if (!pte)
return -1;
@@ -1600,7 +1600,7 @@ static long populate_pmd(struct cpa_data *cpa,
*/
pmd = pmd_offset(pud, start);
if (pmd_none(*pmd))
- if (alloc_pte_page(pmd))
+ if (alloc_pte(pmd))
return -1;
populate_pte(cpa, start, pre_end, cur_pages, pmd, pgprot);
@@ -1641,7 +1641,7 @@ static long populate_pmd(struct cpa_data *cpa,
if (start < end) {
pmd = pmd_offset(pud, start);
if (pmd_none(*pmd))
- if (alloc_pte_page(pmd))
+ if (alloc_pte(pmd))
return -1;
populate_pte(cpa, start, end, num_pages - cur_pages,
--
2.52.0
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v5 3/4] x86/mm/pat: Convert pmd code to use ptdescs
2026-02-11 19:52 [PATCH v5 0/4] Convert 64-bit x86/mm/pat to ptdescs Vishal Moola (Oracle)
2026-02-11 19:52 ` [PATCH v5 1/4] mm: Add address apis for ptdescs Vishal Moola (Oracle)
2026-02-11 19:52 ` [PATCH v5 2/4] x86/mm/pat: Convert pte code to use ptdescs Vishal Moola (Oracle)
@ 2026-02-11 19:52 ` Vishal Moola (Oracle)
2026-02-11 20:07 ` Dave Hansen
2026-02-11 19:52 ` [PATCH v5 4/4] x86/mm/pat: Convert split_large_page() " Vishal Moola (Oracle)
3 siblings, 1 reply; 18+ messages in thread
From: Vishal Moola (Oracle) @ 2026-02-11 19:52 UTC (permalink / raw)
To: linux-kernel, linux-mm, x86, Mike Rapoport (Microsoft)
Cc: akpm, Matthew Wilcox (Oracle),
Dave Hansen, Andy Lutomirski, Peter Zijlstra,
Vishal Moola (Oracle)
We need all allocation and free sites to use the ptdesc APIs in order to
allocate them separately from regular pages. Convert these pmd
allocation/free sites to use ptdescs.
Allocate ptdescs in popoulate_pgd() as well since those allocations may
later be freed by try_to_free_pmd_page().
Also, rename *_pmd_page() functions to *_pmd(). Rename them now to avoid
any confusion later. Eventually these allocations will be backed by a
ptdesc not a page, but that's not important to callers either.
Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
---
arch/x86/mm/pat/set_memory.c | 22 +++++++++++-----------
1 file changed, 11 insertions(+), 11 deletions(-)
diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index 04eae65aedfc..9d6681443e54 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -1412,7 +1412,7 @@ static bool try_to_free_pte(pte_t *pte)
return true;
}
-static bool try_to_free_pmd_page(pmd_t *pmd)
+static bool try_to_free_pmd(pmd_t *pmd)
{
int i;
@@ -1420,7 +1420,7 @@ static bool try_to_free_pmd_page(pmd_t *pmd)
if (!pmd_none(pmd[i]))
return false;
- free_page((unsigned long)pmd);
+ pgtable_free_addr(pmd);
return true;
}
@@ -1446,7 +1446,7 @@ static void __unmap_pmd_range(pud_t *pud, pmd_t *pmd,
unsigned long start, unsigned long end)
{
if (unmap_pte_range(pmd, start, end))
- if (try_to_free_pmd_page(pud_pgtable(*pud)))
+ if (try_to_free_pmd(pud_pgtable(*pud)))
pud_clear(pud);
}
@@ -1490,7 +1490,7 @@ static void unmap_pmd_range(pud_t *pud, unsigned long start, unsigned long end)
* Try again to free the PMD page if haven't succeeded above.
*/
if (!pud_none(*pud))
- if (try_to_free_pmd_page(pud_pgtable(*pud)))
+ if (try_to_free_pmd(pud_pgtable(*pud)))
pud_clear(pud);
}
@@ -1547,9 +1547,9 @@ static int alloc_pte(pmd_t *pmd)
return 0;
}
-static int alloc_pmd_page(pud_t *pud)
+static int alloc_pmd(pud_t *pud)
{
- pmd_t *pmd = (pmd_t *)get_zeroed_page(GFP_KERNEL);
+ pmd_t *pmd = pgtable_alloc_addr(GFP_KERNEL, 0);
if (!pmd)
return -1;
@@ -1622,7 +1622,7 @@ static long populate_pmd(struct cpa_data *cpa,
* We cannot use a 1G page so allocate a PMD page if needed.
*/
if (pud_none(*pud))
- if (alloc_pmd_page(pud))
+ if (alloc_pmd(pud))
return -1;
pmd = pmd_offset(pud, start);
@@ -1678,7 +1678,7 @@ static int populate_pud(struct cpa_data *cpa, unsigned long start, p4d_t *p4d,
* Need a PMD page?
*/
if (pud_none(*pud))
- if (alloc_pmd_page(pud))
+ if (alloc_pmd(pud))
return -1;
cur_pages = populate_pmd(cpa, start, pre_end, cur_pages,
@@ -1715,7 +1715,7 @@ static int populate_pud(struct cpa_data *cpa, unsigned long start, p4d_t *p4d,
pud = pud_offset(p4d, start);
if (pud_none(*pud))
- if (alloc_pmd_page(pud))
+ if (alloc_pmd(pud))
return -1;
tmp = populate_pmd(cpa, start, end, cpa->numpages - cur_pages,
@@ -1743,7 +1743,7 @@ static int populate_pgd(struct cpa_data *cpa, unsigned long addr)
pgd_entry = cpa->pgd + pgd_index(addr);
if (pgd_none(*pgd_entry)) {
- p4d = (p4d_t *)get_zeroed_page(GFP_KERNEL);
+ p4d = pgtable_alloc_addr(GFP_KERNEL, 0);
if (!p4d)
return -1;
@@ -1755,7 +1755,7 @@ static int populate_pgd(struct cpa_data *cpa, unsigned long addr)
*/
p4d = p4d_offset(pgd_entry, addr);
if (p4d_none(*p4d)) {
- pud = (pud_t *)get_zeroed_page(GFP_KERNEL);
+ pud = pgtable_alloc_addr(GFP_KERNEL, 0);
if (!pud)
return -1;
--
2.52.0
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v5 4/4] x86/mm/pat: Convert split_large_page() to use ptdescs
2026-02-11 19:52 [PATCH v5 0/4] Convert 64-bit x86/mm/pat to ptdescs Vishal Moola (Oracle)
` (2 preceding siblings ...)
2026-02-11 19:52 ` [PATCH v5 3/4] x86/mm/pat: Convert pmd " Vishal Moola (Oracle)
@ 2026-02-11 19:52 ` Vishal Moola (Oracle)
2026-02-11 21:59 ` Matthew Wilcox
3 siblings, 1 reply; 18+ messages in thread
From: Vishal Moola (Oracle) @ 2026-02-11 19:52 UTC (permalink / raw)
To: linux-kernel, linux-mm, x86, Mike Rapoport (Microsoft)
Cc: akpm, Matthew Wilcox (Oracle),
Dave Hansen, Andy Lutomirski, Peter Zijlstra,
Vishal Moola (Oracle)
In order to separately allocate ptdescs from pages, we need all allocation
and free sites to use the appropriate functions.
split_large_page() allocates a page to be used as a page table. This
should be allocating a ptdesc, so convert it.
Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
---
arch/x86/mm/pat/set_memory.c | 13 +++++++------
1 file changed, 7 insertions(+), 6 deletions(-)
diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index 9d6681443e54..dfaec7b16ac4 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -1119,9 +1119,10 @@ static void split_set_pte(struct cpa_data *cpa, pte_t *pte, unsigned long pfn,
static int
__split_large_page(struct cpa_data *cpa, pte_t *kpte, unsigned long address,
- struct page *base)
+ struct ptdesc *ptdesc)
{
unsigned long lpaddr, lpinc, ref_pfn, pfn, pfninc = 1;
+ struct page *base = ptdesc_page(ptdesc);
pte_t *pbase = (pte_t *)page_address(base);
unsigned int i, level;
pgprot_t ref_prot;
@@ -1226,18 +1227,18 @@ __split_large_page(struct cpa_data *cpa, pte_t *kpte, unsigned long address,
static int split_large_page(struct cpa_data *cpa, pte_t *kpte,
unsigned long address)
{
- struct page *base;
+ struct ptdesc *ptdesc;
if (!debug_pagealloc_enabled())
spin_unlock(&cpa_lock);
- base = alloc_pages(GFP_KERNEL, 0);
+ ptdesc = pagetable_alloc(GFP_KERNEL, 0);
if (!debug_pagealloc_enabled())
spin_lock(&cpa_lock);
- if (!base)
+ if (!ptdesc)
return -ENOMEM;
- if (__split_large_page(cpa, kpte, address, base))
- __free_page(base);
+ if (__split_large_page(cpa, kpte, address, ptdesc))
+ pagetable_free(ptdesc);
return 0;
}
--
2.52.0
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v5 3/4] x86/mm/pat: Convert pmd code to use ptdescs
2026-02-11 19:52 ` [PATCH v5 3/4] x86/mm/pat: Convert pmd " Vishal Moola (Oracle)
@ 2026-02-11 20:07 ` Dave Hansen
2026-02-11 21:45 ` Vishal Moola (Oracle)
0 siblings, 1 reply; 18+ messages in thread
From: Dave Hansen @ 2026-02-11 20:07 UTC (permalink / raw)
To: Vishal Moola (Oracle),
linux-kernel, linux-mm, x86, Mike Rapoport (Microsoft)
Cc: akpm, Matthew Wilcox (Oracle),
Dave Hansen, Andy Lutomirski, Peter Zijlstra
On 2/11/26 11:52, Vishal Moola (Oracle) wrote:
> Also, rename *_pmd_page() functions to *_pmd(). Rename them now to avoid
> any confusion later. Eventually these allocations will be backed by a
> ptdesc not a page, but that's not important to callers either.
So, pages are still a thing, whether the API says 'struct page' or not.
The hardware kinda uses pages for page tables. ;)
Could we please leave out the unnecessary churn from the renames?
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v5 1/4] mm: Add address apis for ptdescs
2026-02-11 19:52 ` [PATCH v5 1/4] mm: Add address apis for ptdescs Vishal Moola (Oracle)
@ 2026-02-11 20:13 ` Dave Hansen
2026-02-11 21:54 ` Matthew Wilcox
2026-02-11 22:18 ` Vishal Moola (Oracle)
0 siblings, 2 replies; 18+ messages in thread
From: Dave Hansen @ 2026-02-11 20:13 UTC (permalink / raw)
To: Vishal Moola (Oracle),
linux-kernel, linux-mm, x86, Mike Rapoport (Microsoft)
Cc: akpm, Matthew Wilcox (Oracle),
Dave Hansen, Andy Lutomirski, Peter Zijlstra
On 2/11/26 11:52, Vishal Moola (Oracle) wrote:
> +/**
> + * pgtable_alloc_addr - Allocate pagetables to get an address
> + * @gfp: GFP flags
> + * @order: desired pagetable order
FWIW, I don't like how pgtable_alloc_addr() looks in practice. It reads
like it is: "allocate a page table address", not "allocate a page
table". I don't have a better suggestion other than having:
pgtable_alloc()
that returns a page table pointer, a void*, and:
ptdesc_alloc()
which returns a ptdesc*. But I suspect that would get confusing at the
point that ptdescs _themselves_ start getting allocated.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v5 3/4] x86/mm/pat: Convert pmd code to use ptdescs
2026-02-11 20:07 ` Dave Hansen
@ 2026-02-11 21:45 ` Vishal Moola (Oracle)
0 siblings, 0 replies; 18+ messages in thread
From: Vishal Moola (Oracle) @ 2026-02-11 21:45 UTC (permalink / raw)
To: Dave Hansen
Cc: linux-kernel, linux-mm, x86, Mike Rapoport (Microsoft),
akpm, Matthew Wilcox (Oracle),
Dave Hansen, Andy Lutomirski, Peter Zijlstra
On Wed, Feb 11, 2026 at 12:07:05PM -0800, Dave Hansen wrote:
> On 2/11/26 11:52, Vishal Moola (Oracle) wrote:
> > Also, rename *_pmd_page() functions to *_pmd(). Rename them now to avoid
> > any confusion later. Eventually these allocations will be backed by a
> > ptdesc not a page, but that's not important to callers either.
>
> So, pages are still a thing, whether the API says 'struct page' or not.
> The hardware kinda uses pages for page tables. ;)
>
> Could we please leave out the unnecessary churn from the renames?
Yeah I'll drop the renames. I thought it'd be better for
correctness-sake, as the idea is to "allocate memory for a page table"
rather than "allocate a generic page for a page table."
As long as its using the right APIs the naming doesn't really matter
to me.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v5 1/4] mm: Add address apis for ptdescs
2026-02-11 20:13 ` Dave Hansen
@ 2026-02-11 21:54 ` Matthew Wilcox
2026-02-11 22:18 ` Vishal Moola (Oracle)
1 sibling, 0 replies; 18+ messages in thread
From: Matthew Wilcox @ 2026-02-11 21:54 UTC (permalink / raw)
To: Dave Hansen
Cc: Vishal Moola (Oracle),
linux-kernel, linux-mm, x86, Mike Rapoport (Microsoft),
akpm, Dave Hansen, Andy Lutomirski, Peter Zijlstra
On Wed, Feb 11, 2026 at 12:13:10PM -0800, Dave Hansen wrote:
> On 2/11/26 11:52, Vishal Moola (Oracle) wrote:
> > +/**
> > + * pgtable_alloc_addr - Allocate pagetables to get an address
> > + * @gfp: GFP flags
> > + * @order: desired pagetable order
>
> FWIW, I don't like how pgtable_alloc_addr() looks in practice. It reads
> like it is: "allocate a page table address", not "allocate a page
> table". I don't have a better suggestion other than having:
>
> pgtable_alloc()
>
> that returns a page table pointer, a void*, and:
>
> ptdesc_alloc()
>
> which returns a ptdesc*. But I suspect that would get confusing at the
> point that ptdescs _themselves_ start getting allocated.
I think that's fine and consistent with folio_alloc(). Internally to
ptdesc_alloc(), it'll use a kmem_cache_alloc(), so there won't be
any confusion.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v5 2/4] x86/mm/pat: Convert pte code to use ptdescs
2026-02-11 19:52 ` [PATCH v5 2/4] x86/mm/pat: Convert pte code to use ptdescs Vishal Moola (Oracle)
@ 2026-02-11 21:55 ` Matthew Wilcox
2026-02-11 22:23 ` Vishal Moola (Oracle)
0 siblings, 1 reply; 18+ messages in thread
From: Matthew Wilcox @ 2026-02-11 21:55 UTC (permalink / raw)
To: Vishal Moola (Oracle)
Cc: linux-kernel, linux-mm, x86, Mike Rapoport (Microsoft),
akpm, Dave Hansen, Andy Lutomirski, Peter Zijlstra
On Wed, Feb 11, 2026 at 11:52:31AM -0800, Vishal Moola (Oracle) wrote:
> -static bool try_to_free_pte_page(pte_t *pte)
> +static bool try_to_free_pte(pte_t *pte)
I don't like this name though. You're not freeing a single PTE,
you're freeing a level of page tables. How about
try_to_free_pte_table()?
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v5 4/4] x86/mm/pat: Convert split_large_page() to use ptdescs
2026-02-11 19:52 ` [PATCH v5 4/4] x86/mm/pat: Convert split_large_page() " Vishal Moola (Oracle)
@ 2026-02-11 21:59 ` Matthew Wilcox
2026-02-11 22:38 ` Vishal Moola (Oracle)
0 siblings, 1 reply; 18+ messages in thread
From: Matthew Wilcox @ 2026-02-11 21:59 UTC (permalink / raw)
To: Vishal Moola (Oracle)
Cc: linux-kernel, linux-mm, x86, Mike Rapoport (Microsoft),
akpm, Dave Hansen, Andy Lutomirski, Peter Zijlstra
On Wed, Feb 11, 2026 at 11:52:33AM -0800, Vishal Moola (Oracle) wrote:
> static int
> __split_large_page(struct cpa_data *cpa, pte_t *kpte, unsigned long address,
> - struct page *base)
> + struct ptdesc *ptdesc)
> {
> unsigned long lpaddr, lpinc, ref_pfn, pfn, pfninc = 1;
> + struct page *base = ptdesc_page(ptdesc);
> pte_t *pbase = (pte_t *)page_address(base);
We have ptdesc_address() already. Can we avoid the other uses of
'base' in this function?
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v5 1/4] mm: Add address apis for ptdescs
2026-02-11 20:13 ` Dave Hansen
2026-02-11 21:54 ` Matthew Wilcox
@ 2026-02-11 22:18 ` Vishal Moola (Oracle)
2026-02-12 0:07 ` Vishal Moola (Oracle)
1 sibling, 1 reply; 18+ messages in thread
From: Vishal Moola (Oracle) @ 2026-02-11 22:18 UTC (permalink / raw)
To: Dave Hansen
Cc: linux-kernel, linux-mm, x86, Mike Rapoport (Microsoft),
akpm, Matthew Wilcox (Oracle),
Dave Hansen, Andy Lutomirski, Peter Zijlstra
On Wed, Feb 11, 2026 at 12:13:10PM -0800, Dave Hansen wrote:
> On 2/11/26 11:52, Vishal Moola (Oracle) wrote:
> > +/**
> > + * pgtable_alloc_addr - Allocate pagetables to get an address
> > + * @gfp: GFP flags
> > + * @order: desired pagetable order
>
> FWIW, I don't like how pgtable_alloc_addr() looks in practice. It reads
> like it is: "allocate a page table address", not "allocate a page
> table". I don't have a better suggestion other than having:
Hmmm. I meant for it to read "allocate a page table and get its address."
> pgtable_alloc()
>
> that returns a page table pointer, a void*, and:
Initially, I intended to name it pgtable_alloc() & pgtable_free(). I saw
arm using pgtable_alloc() and powerpc using pgtable_free(), so I looked
for another name.
> ptdesc_alloc()
>
> which returns a ptdesc*. But I suspect that would get confusing at the
> point that ptdescs _themselves_ start getting allocated.
The ptdesc_alloc() equivalent right now is named pagetable_alloc(), so I
don't think it'd get confusing.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v5 2/4] x86/mm/pat: Convert pte code to use ptdescs
2026-02-11 21:55 ` Matthew Wilcox
@ 2026-02-11 22:23 ` Vishal Moola (Oracle)
2026-02-11 23:04 ` Dave Hansen
0 siblings, 1 reply; 18+ messages in thread
From: Vishal Moola (Oracle) @ 2026-02-11 22:23 UTC (permalink / raw)
To: Matthew Wilcox
Cc: linux-kernel, linux-mm, x86, Mike Rapoport (Microsoft),
akpm, Dave Hansen, Andy Lutomirski, Peter Zijlstra
On Wed, Feb 11, 2026 at 09:55:51PM +0000, Matthew Wilcox wrote:
> On Wed, Feb 11, 2026 at 11:52:31AM -0800, Vishal Moola (Oracle) wrote:
> > -static bool try_to_free_pte_page(pte_t *pte)
> > +static bool try_to_free_pte(pte_t *pte)
>
> I don't like this name though. You're not freeing a single PTE,
> you're freeing a level of page tables. How about
> try_to_free_pte_table()?
Ah, right. That would make sense to me.
Dave doesn't want the renaming at all[1], so I'm planning to leave it as
try_to_free_pte_page() though.
[1] https://lore.kernel.org/linux-mm/20260211195233.368497-1-vishal.moola@gmail.com/T/#m72f1d8d91b16f0693cca4271cc8685d2337372d7
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v5 4/4] x86/mm/pat: Convert split_large_page() to use ptdescs
2026-02-11 21:59 ` Matthew Wilcox
@ 2026-02-11 22:38 ` Vishal Moola (Oracle)
0 siblings, 0 replies; 18+ messages in thread
From: Vishal Moola (Oracle) @ 2026-02-11 22:38 UTC (permalink / raw)
To: Matthew Wilcox
Cc: linux-kernel, linux-mm, x86, Mike Rapoport (Microsoft),
akpm, Dave Hansen, Andy Lutomirski, Peter Zijlstra
On Wed, Feb 11, 2026 at 09:59:19PM +0000, Matthew Wilcox wrote:
> On Wed, Feb 11, 2026 at 11:52:33AM -0800, Vishal Moola (Oracle) wrote:
> > static int
> > __split_large_page(struct cpa_data *cpa, pte_t *kpte, unsigned long address,
> > - struct page *base)
> > + struct ptdesc *ptdesc)
> > {
> > unsigned long lpaddr, lpinc, ref_pfn, pfn, pfninc = 1;
> > + struct page *base = ptdesc_page(ptdesc);
> > pte_t *pbase = (pte_t *)page_address(base);
>
> We have ptdesc_address() already. Can we avoid the other uses of
> 'base' in this function?
We could, but not without helpers similar to folio_pfn() and
folio_mk_pte(). I hadn't added those in this patchset since my primary
goal right now is to ensure all the allocation/free sites are using the
proper apis.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v5 2/4] x86/mm/pat: Convert pte code to use ptdescs
2026-02-11 22:23 ` Vishal Moola (Oracle)
@ 2026-02-11 23:04 ` Dave Hansen
0 siblings, 0 replies; 18+ messages in thread
From: Dave Hansen @ 2026-02-11 23:04 UTC (permalink / raw)
To: Vishal Moola (Oracle), Matthew Wilcox
Cc: linux-kernel, linux-mm, x86, Mike Rapoport (Microsoft),
akpm, Dave Hansen, Andy Lutomirski, Peter Zijlstra
On 2/11/26 14:23, Vishal Moola (Oracle) wrote:
> On Wed, Feb 11, 2026 at 09:55:51PM +0000, Matthew Wilcox wrote:
>> On Wed, Feb 11, 2026 at 11:52:31AM -0800, Vishal Moola (Oracle) wrote:
>>> -static bool try_to_free_pte_page(pte_t *pte)
>>> +static bool try_to_free_pte(pte_t *pte)
>> I don't like this name though. You're not freeing a single PTE,
>> you're freeing a level of page tables. How about
>> try_to_free_pte_table()?
> Ah, right. That would make sense to me.
>
> Dave doesn't want the renaming at all[1], so I'm planning to leave it as
> try_to_free_pte_page() though.
Let's just talk about renaming later, please. I'm not totally against
it, but there is enough going on in this set already.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v5 1/4] mm: Add address apis for ptdescs
2026-02-11 22:18 ` Vishal Moola (Oracle)
@ 2026-02-12 0:07 ` Vishal Moola (Oracle)
2026-02-18 20:23 ` Vishal Moola (Oracle)
0 siblings, 1 reply; 18+ messages in thread
From: Vishal Moola (Oracle) @ 2026-02-12 0:07 UTC (permalink / raw)
To: Dave Hansen
Cc: linux-kernel, linux-mm, x86, Mike Rapoport (Microsoft),
akpm, Matthew Wilcox (Oracle),
Dave Hansen, Andy Lutomirski, Peter Zijlstra
On Wed, Feb 11, 2026 at 02:18:20PM -0800, Vishal Moola (Oracle) wrote:
> On Wed, Feb 11, 2026 at 12:13:10PM -0800, Dave Hansen wrote:
> > On 2/11/26 11:52, Vishal Moola (Oracle) wrote:
> > > +/**
> > > + * pgtable_alloc_addr - Allocate pagetables to get an address
> > > + * @gfp: GFP flags
> > > + * @order: desired pagetable order
> >
> > FWIW, I don't like how pgtable_alloc_addr() looks in practice. It reads
> > like it is: "allocate a page table address", not "allocate a page
> > table". I don't have a better suggestion other than having:
>
> Hmmm. I meant for it to read "allocate a page table and get its address."
>
> > pgtable_alloc()
> >
> > that returns a page table pointer, a void*, and:
>
> Initially, I intended to name it pgtable_alloc() & pgtable_free(). I saw
> arm using pgtable_alloc() and powerpc using pgtable_free(), so I looked
> for another name.
I've done some digging about these names.
The arm cases uses a function pointer, so we should be able to use that
name without issue.
What do you think is a reasonable name for freeing?
pgtable_free() is defined for sparc and powerpc. I could rename them
prefixed with "__" to get the name since they only have 1-2 internal
callers.
> > ptdesc_alloc()
> >
> > which returns a ptdesc*. But I suspect that would get confusing at the
> > point that ptdescs _themselves_ start getting allocated.
>
> The ptdesc_alloc() equivalent right now is named pagetable_alloc(), so I
> don't think it'd get confusing.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v5 1/4] mm: Add address apis for ptdescs
2026-02-12 0:07 ` Vishal Moola (Oracle)
@ 2026-02-18 20:23 ` Vishal Moola (Oracle)
2026-02-18 20:27 ` Dave Hansen
0 siblings, 1 reply; 18+ messages in thread
From: Vishal Moola (Oracle) @ 2026-02-18 20:23 UTC (permalink / raw)
To: Dave Hansen
Cc: linux-kernel, linux-mm, x86, Mike Rapoport (Microsoft),
akpm, Matthew Wilcox (Oracle),
Dave Hansen, Andy Lutomirski, Peter Zijlstra
On Wed, Feb 11, 2026 at 04:07:54PM -0800, Vishal Moola (Oracle) wrote:
> On Wed, Feb 11, 2026 at 02:18:20PM -0800, Vishal Moola (Oracle) wrote:
> > On Wed, Feb 11, 2026 at 12:13:10PM -0800, Dave Hansen wrote:
> > > On 2/11/26 11:52, Vishal Moola (Oracle) wrote:
> > > > +/**
> > > > + * pgtable_alloc_addr - Allocate pagetables to get an address
> > > > + * @gfp: GFP flags
> > > > + * @order: desired pagetable order
> > >
> > > FWIW, I don't like how pgtable_alloc_addr() looks in practice. It reads
> > > like it is: "allocate a page table address", not "allocate a page
> > > table". I don't have a better suggestion other than having:
> >
> > Hmmm. I meant for it to read "allocate a page table and get its address."
> >
> > > pgtable_alloc()
> > >
> > > that returns a page table pointer, a void*, and:
> >
> > Initially, I intended to name it pgtable_alloc() & pgtable_free(). I saw
> > arm using pgtable_alloc() and powerpc using pgtable_free(), so I looked
> > for another name.
>
> I've done some digging about these names.
> The arm cases uses a function pointer, so we should be able to use that
> name without issue.
Dave, I wanted to follow up on the below question:
> What do you think is a reasonable name for freeing?
>
> pgtable_free() is defined for sparc and powerpc. I could rename them
> prefixed with "__" to get the name since they only have 1-2 internal
> callers.
Matthew brought another question to my attention in this particular
scenario. Should pat/set_memory's alloc_*_page() use pte_alloc_one()
instead of get_zeroed_page()? Is there any reason not to?
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v5 1/4] mm: Add address apis for ptdescs
2026-02-18 20:23 ` Vishal Moola (Oracle)
@ 2026-02-18 20:27 ` Dave Hansen
0 siblings, 0 replies; 18+ messages in thread
From: Dave Hansen @ 2026-02-18 20:27 UTC (permalink / raw)
To: Vishal Moola (Oracle)
Cc: linux-kernel, linux-mm, x86, Mike Rapoport (Microsoft),
akpm, Matthew Wilcox (Oracle),
Dave Hansen, Andy Lutomirski, Peter Zijlstra
On 2/18/26 12:23, Vishal Moola (Oracle) wrote:
>> What do you think is a reasonable name for freeing?
>>
>> pgtable_free() is defined for sparc and powerpc. I could rename them
>> prefixed with "__" to get the name since they only have 1-2 internal
>> callers.
> Matthew brought another question to my attention in this particular
> scenario. Should pat/set_memory's alloc_*_page() use pte_alloc_one()
> instead of get_zeroed_page()? Is there any reason not to?
They're not special in any way I can think of. There's no reason I know
of to keep them special and avoid converting them.
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2026-02-18 20:27 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-02-11 19:52 [PATCH v5 0/4] Convert 64-bit x86/mm/pat to ptdescs Vishal Moola (Oracle)
2026-02-11 19:52 ` [PATCH v5 1/4] mm: Add address apis for ptdescs Vishal Moola (Oracle)
2026-02-11 20:13 ` Dave Hansen
2026-02-11 21:54 ` Matthew Wilcox
2026-02-11 22:18 ` Vishal Moola (Oracle)
2026-02-12 0:07 ` Vishal Moola (Oracle)
2026-02-18 20:23 ` Vishal Moola (Oracle)
2026-02-18 20:27 ` Dave Hansen
2026-02-11 19:52 ` [PATCH v5 2/4] x86/mm/pat: Convert pte code to use ptdescs Vishal Moola (Oracle)
2026-02-11 21:55 ` Matthew Wilcox
2026-02-11 22:23 ` Vishal Moola (Oracle)
2026-02-11 23:04 ` Dave Hansen
2026-02-11 19:52 ` [PATCH v5 3/4] x86/mm/pat: Convert pmd " Vishal Moola (Oracle)
2026-02-11 20:07 ` Dave Hansen
2026-02-11 21:45 ` Vishal Moola (Oracle)
2026-02-11 19:52 ` [PATCH v5 4/4] x86/mm/pat: Convert split_large_page() " Vishal Moola (Oracle)
2026-02-11 21:59 ` Matthew Wilcox
2026-02-11 22:38 ` Vishal Moola (Oracle)
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox