linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [RFC v1 00/10] Misc powerpc fixes and refactoring
@ 2026-02-25 11:04 Ritesh Harjani (IBM)
  2026-02-25 11:04 ` [RFC v1 02/10] powerpc: book3s64: Fix unmap race with PMD THP migration entry Ritesh Harjani (IBM)
                   ` (9 more replies)
  0 siblings, 10 replies; 12+ messages in thread
From: Ritesh Harjani (IBM) @ 2026-02-25 11:04 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-mm, Hugh Dickins, Andrew Morton, Madhavan Srinivasan,
	Nicholas Piggin, Aneesh Kumar K . V, Christophe Leroy,
	Venkat Rao Bagalkote, Ritesh Harjani (IBM)

Hello All,

- Patches 1 & 2, are fixes found during reviews and while running mm selftests.
- Patch-3 adds a way to verify / test this race using debug_vm_pgtable.c
- Patches 4-10 are various cleanups and refactoring, that I have been carrying in
  my tree. I felt it's time to push it, if those changes seems logical to others too.

Please review and share your thoughts!

-ritesh


Ritesh Harjani (IBM) (10):
  powerpc/pgtable-frag: Fix bad page state in pte_frag_destroy
  powerpc: book3s64: Fix unmap race with PMD THP migration entry
  mm/debug_vm_pgtable.c: Add test to zap THP migration entry
  powerpc/64s/tlbflush-radix: Remove unused radix__flush_tlb_pwc()
  powerpc/64s: Move serialize_against_pte_lookup() to hash_pgtable.c
  powerpc/64s: Kill the unused argument of exit_lazy_flush_tlb
  powerpc: book3s64: Rename tlbie_va_lpid to tlbie_va_pid_lpid
  powerpc: book3s64: Rename tlbie_lpid_va to tlbie_va_lpid
  powerpc: book3s64: Make use of H_RPTI_TYPE_ALL macro
  powerpc: Add MMU_FTRS_POSSIBLE & MMU_FTRS_ALWAYS

 arch/powerpc/include/asm/book3s/64/pgtable.h  |  1 -
 .../include/asm/book3s/64/tlbflush-radix.h    |  1 -
 arch/powerpc/kernel/setup-common.c            |  4 ++
 arch/powerpc/mm/book3s64/hash_pgtable.c       | 21 +++++++
 arch/powerpc/mm/book3s64/internal.h           |  2 -
 arch/powerpc/mm/book3s64/pgtable.c            | 46 ++++++--------
 arch/powerpc/mm/book3s64/radix_tlb.c          | 61 ++++++++-----------
 arch/powerpc/mm/pgtable-frag.c                |  1 +
 mm/debug_vm_pgtable.c                         | 38 ++++++++++++
 9 files changed, 108 insertions(+), 67 deletions(-)

--
2.53.0



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [RFC v1 01/10] powerpc/pgtable-frag: Fix bad page state in pte_frag_destroy
  2026-02-25 11:42 ` [RFC v1 01/10] powerpc/pgtable-frag: Fix bad page state in pte_frag_destroy Ritesh Harjani
@ 2026-02-25 11:04   ` Ritesh Harjani (IBM)
  0 siblings, 0 replies; 12+ messages in thread
From: Ritesh Harjani (IBM) @ 2026-02-25 11:04 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-mm, Hugh Dickins, Andrew Morton, Madhavan Srinivasan,
	Nicholas Piggin, Aneesh Kumar K . V, Christophe Leroy,
	Venkat Rao Bagalkote, Ritesh Harjani (IBM)

powerpc uses pt_frag_refcount as a reference counter for tracking it's
pte and pmd page table fragments. For PTE table, in case of Hash with
64K pagesize, we have 16 fragments of 4K size in one 64K page.

Patch series [1] "mm: free retracted page table by RCU"
added pte_free_defer() to defer the freeing of PTE tables when
retract_page_tables() is called for madvise MADV_COLLAPSE on shmem
range.
[1]: https://lore.kernel.org/all/7cd843a9-aa80-14f-5eb2-33427363c20@google.com/

pte_free_defer() sets the active flag on the corresponding fragment's
folio & calls pte_fragment_free(), which reduces the pt_frag_refcount.
When pt_frag_refcount reaches 0 (no active fragment using the folio), it
checks if the folio active flag is set, if set, it calls call_rcu to
free the folio, it the active flag is unset then it calls pte_free_now().

Now, this can lead to following problem in a corner case...

[  265.351553][  T183] BUG: Bad page state in process a.out  pfn:20d62
[  265.353555][  T183] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x20d62
[  265.355457][  T183] flags: 0x3ffff800000100(active|node=0|zone=0|lastcpupid=0x7ffff)
[  265.358719][  T183] raw: 003ffff800000100 0000000000000000 5deadbeef0000122 0000000000000000
[  265.360177][  T183] raw: 0000000000000000 c0000000119caf58 00000000ffffffff 0000000000000000
[  265.361438][  T183] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
[  265.362572][  T183] Modules linked in:
[  265.364622][  T183] CPU: 0 UID: 0 PID: 183 Comm: a.out Not tainted 6.18.0-rc3-00141-g1ddeaaace7ff-dirty #53 VOLUNTARY
[  265.364785][  T183] Hardware name: IBM pSeries (emulated by qemu) POWER10 (architected) 0x801200 0xf000006 of:SLOF,git-ee03ae pSeries
[  265.364908][  T183] Call Trace:
[  265.364955][  T183] [c000000011e6f7c0] [c000000001cfaa18] dump_stack_lvl+0x130/0x148 (unreliable)
[  265.365202][  T183] [c000000011e6f7f0] [c000000000794758] bad_page+0xb4/0x1c8
[  265.365384][  T183] [c000000011e6f890] [c00000000079c020] __free_frozen_pages+0x838/0xd08
[  265.365554][  T183] [c000000011e6f980] [c0000000000a70ac] pte_frag_destroy+0x298/0x310
[  265.365729][  T183] [c000000011e6fa30] [c0000000000aa764] arch_exit_mmap+0x34/0x218
[  265.365912][  T183] [c000000011e6fa80] [c000000000751698] exit_mmap+0xb8/0x820
[  265.366080][  T183] [c000000011e6fc30] [c0000000001b1258] __mmput+0x98/0x300
[  265.366244][  T183] [c000000011e6fc80] [c0000000001c81f8] do_exit+0x470/0x1508
[  265.366421][  T183] [c000000011e6fd70] [c0000000001c95e4] do_group_exit+0x88/0x148
[  265.366602][  T183] [c000000011e6fdc0] [c0000000001c96ec] pid_child_should_wake+0x0/0x178
[  265.366780][  T183] [c000000011e6fdf0] [c00000000003a270] system_call_exception+0x1b0/0x4e0
[  265.366958][  T183] [c000000011e6fe50] [c00000000000d05c] system_call_vectored_common+0x15c/0x2ec

The bad page state error occurs when such a folio gets freed (with
active flag set), from do_exit() path in parallel.

... this can happen when the pte fragment was allocated from this folio,
but when all the fragments get freed, the pte_frag_refcount still had some
unused fragments. Now, if this process exits, with such folio as it's cached
pte_frag in mm->context, then during pte_frag_destroy(), we simply call
pagetable_dtor() and pagetable_free(), meaning it doesn't clear the
active flag. This, can lead to the above bug. Since we are anyway in
do_exit() path, then if the refcount is 0, then I guess it should be
ok to simply clear the folio active flag before calling pagetable_dtor()
& pagetable_free().

Fixes: 32cc0b7c9d50 ("powerpc: add pte_free_defer() for pgtables sharing page")
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
 arch/powerpc/mm/pgtable-frag.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/mm/pgtable-frag.c b/arch/powerpc/mm/pgtable-frag.c
index 77e55eac16e4..ae742564a3d5 100644
--- a/arch/powerpc/mm/pgtable-frag.c
+++ b/arch/powerpc/mm/pgtable-frag.c
@@ -25,6 +25,7 @@ void pte_frag_destroy(void *pte_frag)
 	count = ((unsigned long)pte_frag & ~PAGE_MASK) >> PTE_FRAG_SIZE_SHIFT;
 	/* We allow PTE_FRAG_NR fragments from a PTE page */
 	if (atomic_sub_and_test(PTE_FRAG_NR - count, &ptdesc->pt_frag_refcount)) {
+		folio_clear_active(ptdesc_folio(ptdesc));
 		pagetable_dtor(ptdesc);
 		pagetable_free(ptdesc);
 	}
-- 
2.53.0



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [RFC v1 02/10] powerpc: book3s64: Fix unmap race with PMD THP migration entry
  2026-02-25 11:04 [RFC v1 00/10] Misc powerpc fixes and refactoring Ritesh Harjani (IBM)
@ 2026-02-25 11:04 ` Ritesh Harjani (IBM)
  2026-02-25 11:04 ` [RFC v1 03/10] mm/debug_vm_pgtable.c: Add test to zap " Ritesh Harjani (IBM)
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 12+ messages in thread
From: Ritesh Harjani (IBM) @ 2026-02-25 11:04 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-mm, Hugh Dickins, Andrew Morton, Madhavan Srinivasan,
	Nicholas Piggin, Aneesh Kumar K . V, Christophe Leroy,
	Venkat Rao Bagalkote, Ritesh Harjani (IBM),
	Pavithra Prakash

The following race is possible with migration swap entries or
device-private THP entries. e.g. when move_pages is called on a PMD THP
page, then there maybe an intermediate state, where PMD entry acts as
a migration swap entry (pmd_present() is true). Then if an munmap
happens at the same time, then this VM_BUG_ON() can happen in
pmdp_huge_get_and_clear_full().

This patch fixes that.

Thread A: move_pages() syscall
  add_folio_for_migration()
    mmap_read_lock(mm)
    folio_isolate_lru(folio)
    mmap_read_unlock(mm)

  do_move_pages_to_node()
    migrate_pages()
      try_to_migrate_one()
        spin_lock(ptl)
        set_pmd_migration_entry()
          pmdp_invalidate()     # PMD: _PAGE_INVALID | _PAGE_PTE | pfn
          set_pmd_at()          # PMD: migration swap entry (pmd_present=0)
        spin_unlock(ptl)
        [page copy phase]       # <--- RACE WINDOW -->

Thread B: munmap()
  mmap_write_downgrade(mm)
  unmap_vmas() -> zap_pmd_range()
    zap_huge_pmd()
      __pmd_trans_huge_lock()
        pmd_is_huge():          # !pmd_present && !pmd_none -> TRUE (swap entry)
        pmd_lock() -> 		# spin_lock(ptl), waits for Thread A to release ptl
      pmdp_huge_get_and_clear_full()
        VM_BUG_ON(!pmd_present(*pmdp))  # HITS!

[  287.738700][ T1867] ------------[ cut here ]------------
[  287.743843][ T1867] kernel BUG at arch/powerpc/mm/book3s64/pgtable.c:187!
cpu 0x0: Vector: 700 (Program Check) at [c00000044037f4f0]
    pc: c000000000094ca4: pmdp_huge_get_and_clear_full+0x6c/0x23c
    lr: c000000000645dec: zap_huge_pmd+0xb0/0x868
    sp: c00000044037f790
   msr: 800000000282b033
  current = 0xc0000004032c1a00
  paca    = 0xc000000004fe0000   irqmask: 0x03   irq_happened: 0x09
    pid   = 1867, comm = a.out
kernel BUG at :187!
Linux version 6.19.0-12136-g14360d4f917c-dirty (powerpc64le-linux-gnu-gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #27 SMP PREEMPT Sun Feb 22 10:38:56 IST 2026
enter ? for help
[link register   ] c000000000645dec zap_huge_pmd+0xb0/0x868
[c00000044037f790] c00000044037f7d0 (unreliable)
[c00000044037f7d0] c000000000645dcc zap_huge_pmd+0x90/0x868
[c00000044037f840] c0000000005724cc unmap_page_range+0x176c/0x1f40
[c00000044037fa00] c000000000572ea0 unmap_vmas+0xb0/0x1d8
[c00000044037fa90] c0000000005af254 unmap_region+0xb4/0x128
[c00000044037fb50] c0000000005af400 vms_complete_munmap_vmas+0x138/0x310
[c00000044037fbe0] c0000000005b0f1c do_vmi_align_munmap+0x1ec/0x238
[c00000044037fd30] c0000000005b3688 __vm_munmap+0x170/0x1f8
[c00000044037fdf0] c000000000587f74 sys_munmap+0x2c/0x40
[c00000044037fe10] c000000000032668 system_call_exception+0x128/0x350
[c00000044037fe50] c00000000000d05c system_call_vectored_common+0x15c/0x2ec
---- Exception: 3000 (System Call Vectored) at 0000000010064a2c
SP (7fff9b1ee9c0) is in userspace
0:mon> zh

Fixes: 75358ea359e7c ("powerpc/mm/book3s64: Fix MADV_DONTNEED and parallel page fault race")
Reported-by: Pavithra Prakash <pavrampu@linux.vnet.ibm.com>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
 arch/powerpc/mm/book3s64/pgtable.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/mm/book3s64/pgtable.c b/arch/powerpc/mm/book3s64/pgtable.c
index 4b09c04654a8..359092001670 100644
--- a/arch/powerpc/mm/book3s64/pgtable.c
+++ b/arch/powerpc/mm/book3s64/pgtable.c
@@ -210,8 +210,23 @@ pmd_t pmdp_huge_get_and_clear_full(struct vm_area_struct *vma,
 {
 	pmd_t pmd;
 	VM_BUG_ON(addr & ~HPAGE_PMD_MASK);
-	VM_BUG_ON((pmd_present(*pmdp) && !pmd_trans_huge(*pmdp)) ||
-		   !pmd_present(*pmdp));
+	VM_BUG_ON((pmd_present(*pmdp) && !pmd_trans_huge(*pmdp)));
+
+	if (!pmd_present(*pmdp)) {
+		/*
+		 * Non-present PMDs can be migration entries or device-private
+		 * THP entries. Since these are non-present, so there is no TLB
+		 * backing. This happens when the address space is being
+		 * unmapped zap_huge_pmd(), and we encounter non-present pmds.
+		 * So it is safe to just clear the PMDs here. zap_huge_pmd(),
+		 * will take care of withdraw of the deposited table.
+		 */
+		pmd = pmdp_get(pmdp);
+		pmd_clear(pmdp);
+		page_table_check_pmd_clear(vma->vm_mm, addr, pmd);
+		return pmd;
+	}
+
 	pmd = pmdp_huge_get_and_clear(vma->vm_mm, addr, pmdp);
 	/*
 	 * if it not a fullmm flush, then we can possibly end up converting
--
2.53.0



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [RFC v1 03/10] mm/debug_vm_pgtable.c: Add test to zap THP migration entry
  2026-02-25 11:04 [RFC v1 00/10] Misc powerpc fixes and refactoring Ritesh Harjani (IBM)
  2026-02-25 11:04 ` [RFC v1 02/10] powerpc: book3s64: Fix unmap race with PMD THP migration entry Ritesh Harjani (IBM)
@ 2026-02-25 11:04 ` Ritesh Harjani (IBM)
  2026-02-25 11:04 ` [RFC v1 04/10] powerpc/64s/tlbflush-radix: Remove unused radix__flush_tlb_pwc() Ritesh Harjani (IBM)
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 12+ messages in thread
From: Ritesh Harjani (IBM) @ 2026-02-25 11:04 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-mm, Hugh Dickins, Andrew Morton, Madhavan Srinivasan,
	Nicholas Piggin, Aneesh Kumar K . V, Christophe Leroy,
	Venkat Rao Bagalkote, Ritesh Harjani (IBM)

As discussed in the previous patch, there is a race possible with
zap_huge_pmd() and migrate_pages().
This adds a verification test.

Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
 mm/debug_vm_pgtable.c | 38 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index 83cf07269f13..802f9f03c8ef 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -837,8 +837,45 @@ static void __init pmd_softleaf_tests(struct pgtable_debug_args *args)
 	pmd2 = __swp_entry_to_pmd(arch_entry);
 	WARN_ON(memcmp(&pmd1, &pmd2, sizeof(pmd1)));
 }
+
+
+static void __init pmd_thp_migration_zap_tests(struct pgtable_debug_args *args)
+{
+	pmd_t pmd;
+	unsigned long vaddr = args->vaddr & HPAGE_PMD_MASK;
+
+	if (!has_transparent_hugepage() || !thp_migration_supported())
+		return;
+
+	pr_debug("Validating PMD zap on THP migration entry\n");
+
+	pmd = swp_entry_to_pmd(args->leaf_entry);
+	pgtable_trans_huge_deposit(args->mm, args->pmdp, args->start_ptep);
+
+	/* Verify that it's a valid migration PMD before we proceed */
+	WARN_ON(!pmd_is_huge(pmd));
+	WARN_ON(!pmd_is_valid_softleaf(pmd));
+	WARN_ON(pmd_present(pmd));
+	WARN_ON(pmd_none(pmd));
+
+	/* Install the migration PMD entry */
+	set_pmd_at(args->mm, vaddr, args->pmdp, pmd);
+
+	/*
+	 * THP migration path can race with zap_huge_pmd(), which calls
+	 * pmdp_huge_get_and_clear_full().
+	 */
+	pmd = pmdp_huge_get_and_clear_full(args->vma, vaddr, args->pmdp, 1);
+
+	WARN_ON(!pmd_is_valid_softleaf(pmd));
+	WARN_ON(!pmd_none(pmdp_get(args->pmdp)));
+
+	pgtable_trans_huge_withdraw(args->mm, args->pmdp);
+}
+
 #else  /* !CONFIG_ARCH_ENABLE_THP_MIGRATION */
 static void __init pmd_softleaf_tests(struct pgtable_debug_args *args) { }
+static void __init pmd_thp_migration_zap_tests(struct pgtable_debug_args *args) { }
 #endif /* CONFIG_ARCH_ENABLE_THP_MIGRATION */
 
 static void __init swap_migration_tests(struct pgtable_debug_args *args)
@@ -1348,6 +1385,7 @@ static int __init debug_vm_pgtable(void)
 	pmd_clear_tests(&args);
 	pmd_advanced_tests(&args);
 	pmd_huge_tests(&args);
+	pmd_thp_migration_zap_tests(&args);
 	pmd_populate_tests(&args);
 	spin_unlock(ptl);
 
-- 
2.53.0



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [RFC v1 04/10] powerpc/64s/tlbflush-radix: Remove unused radix__flush_tlb_pwc()
  2026-02-25 11:04 [RFC v1 00/10] Misc powerpc fixes and refactoring Ritesh Harjani (IBM)
  2026-02-25 11:04 ` [RFC v1 02/10] powerpc: book3s64: Fix unmap race with PMD THP migration entry Ritesh Harjani (IBM)
  2026-02-25 11:04 ` [RFC v1 03/10] mm/debug_vm_pgtable.c: Add test to zap " Ritesh Harjani (IBM)
@ 2026-02-25 11:04 ` Ritesh Harjani (IBM)
  2026-02-25 11:04 ` [RFC v1 05/10] powerpc/64s: Move serialize_against_pte_lookup() to hash_pgtable.c Ritesh Harjani (IBM)
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 12+ messages in thread
From: Ritesh Harjani (IBM) @ 2026-02-25 11:04 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-mm, Hugh Dickins, Andrew Morton, Madhavan Srinivasan,
	Nicholas Piggin, Aneesh Kumar K . V, Christophe Leroy,
	Venkat Rao Bagalkote, Ritesh Harjani (IBM)

Commit 52162ec784fa
("powerpc/mm/book3s64/radix: Use freed_tables instead of need_flush_all")
removed radix__flush_tlb_pwc() definition, but missed to remove the extern
declaration. This patch removes it.

Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
 arch/powerpc/include/asm/book3s/64/tlbflush-radix.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h b/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
index a38542259fab..de9b96660582 100644
--- a/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
+++ b/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
@@ -92,7 +92,6 @@ extern void radix__flush_tlb_page_psize(struct mm_struct *mm, unsigned long vmad
 #define radix__flush_tlb_page(vma,addr)	radix__local_flush_tlb_page(vma,addr)
 #define radix__flush_tlb_page_psize(mm,addr,p) radix__local_flush_tlb_page_psize(mm,addr,p)
 #endif
-extern void radix__flush_tlb_pwc(struct mmu_gather *tlb, unsigned long addr);
 extern void radix__flush_tlb_collapsed_pmd(struct mm_struct *mm, unsigned long addr);
 extern void radix__flush_tlb_all(void);
 
-- 
2.53.0



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [RFC v1 05/10] powerpc/64s: Move serialize_against_pte_lookup() to hash_pgtable.c
  2026-02-25 11:04 [RFC v1 00/10] Misc powerpc fixes and refactoring Ritesh Harjani (IBM)
                   ` (2 preceding siblings ...)
  2026-02-25 11:04 ` [RFC v1 04/10] powerpc/64s/tlbflush-radix: Remove unused radix__flush_tlb_pwc() Ritesh Harjani (IBM)
@ 2026-02-25 11:04 ` Ritesh Harjani (IBM)
  2026-02-25 11:04 ` [RFC v1 06/10] powerpc/64s: Kill the unused argument of exit_lazy_flush_tlb Ritesh Harjani (IBM)
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 12+ messages in thread
From: Ritesh Harjani (IBM) @ 2026-02-25 11:04 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-mm, Hugh Dickins, Andrew Morton, Madhavan Srinivasan,
	Nicholas Piggin, Aneesh Kumar K . V, Christophe Leroy,
	Venkat Rao Bagalkote, Ritesh Harjani (IBM)

Originally,
commit fa4531f753f1 ("powerpc/mm: Don't send IPI to all cpus on THP updates")
introduced serialize_against_pte_lookup() call for both Radix and Hash.

However below commit fixed the race with Radix
commit 70cbc3cc78a9 ("mm: gup: fix the fast GUP race against THP collapse")

And therefore following commit removed the
serialize_against_pte_lookup() call from radix_pgtable.c
commit bedf03416913
("powerpc/64s/radix: don't need to broadcast IPI for radix pmd collapse flush")

Now since serialize_against_pte_lookup() only gets called from
hash__pmdp_collapse_flush(), thus move the related functions to
hash_pgtable.c

Hence this patch:
- moves serialize_against_pte_lookup() from radix_pgtable.c to hash_pgtable.c
- removes the radix specific calls from do_serialize()
- renames do_serialize() to do_nothing().

There should not be any functionality change in this patch.

Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
 arch/powerpc/include/asm/book3s/64/pgtable.h |  1 -
 arch/powerpc/mm/book3s64/hash_pgtable.c      | 21 ++++++++++++++++
 arch/powerpc/mm/book3s64/pgtable.c           | 25 --------------------
 3 files changed, 21 insertions(+), 26 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 1a91762b455d..ff264d930fe8 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -1400,7 +1400,6 @@ static inline bool arch_needs_pgtable_deposit(void)
 		return false;
 	return true;
 }
-extern void serialize_against_pte_lookup(struct mm_struct *mm);
 
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 
diff --git a/arch/powerpc/mm/book3s64/hash_pgtable.c b/arch/powerpc/mm/book3s64/hash_pgtable.c
index ac2a24d15d2e..d9b5b751d7b7 100644
--- a/arch/powerpc/mm/book3s64/hash_pgtable.c
+++ b/arch/powerpc/mm/book3s64/hash_pgtable.c
@@ -221,6 +221,27 @@ unsigned long hash__pmd_hugepage_update(struct mm_struct *mm, unsigned long addr
 	return old;
 }
 
+static void do_nothing(void *arg)
+{
+
+}
+
+/*
+ * Serialize against __find_linux_pte() which does lock-less
+ * lookup in page tables with local interrupts disabled. For huge pages
+ * it casts pmd_t to pte_t. Since format of pte_t is different from
+ * pmd_t we want to prevent transit from pmd pointing to page table
+ * to pmd pointing to huge page (and back) while interrupts are disabled.
+ * We clear pmd to possibly replace it with page table pointer in
+ * different code paths. So make sure we wait for the parallel
+ * __find_linux_pte() to finish.
+ */
+static void serialize_against_pte_lookup(struct mm_struct *mm)
+{
+	smp_mb();
+	smp_call_function_many(mm_cpumask(mm), do_nothing, mm, 1);
+}
+
 pmd_t hash__pmdp_collapse_flush(struct vm_area_struct *vma, unsigned long address,
 			    pmd_t *pmdp)
 {
diff --git a/arch/powerpc/mm/book3s64/pgtable.c b/arch/powerpc/mm/book3s64/pgtable.c
index 359092001670..84284dff650a 100644
--- a/arch/powerpc/mm/book3s64/pgtable.c
+++ b/arch/powerpc/mm/book3s64/pgtable.c
@@ -150,31 +150,6 @@ void set_pud_at(struct mm_struct *mm, unsigned long addr,
 	return set_pte_at_unchecked(mm, addr, pudp_ptep(pudp), pud_pte(pud));
 }
 
-static void do_serialize(void *arg)
-{
-	/* We've taken the IPI, so try to trim the mask while here */
-	if (radix_enabled()) {
-		struct mm_struct *mm = arg;
-		exit_lazy_flush_tlb(mm, false);
-	}
-}
-
-/*
- * Serialize against __find_linux_pte() which does lock-less
- * lookup in page tables with local interrupts disabled. For huge pages
- * it casts pmd_t to pte_t. Since format of pte_t is different from
- * pmd_t we want to prevent transit from pmd pointing to page table
- * to pmd pointing to huge page (and back) while interrupts are disabled.
- * We clear pmd to possibly replace it with page table pointer in
- * different code paths. So make sure we wait for the parallel
- * __find_linux_pte() to finish.
- */
-void serialize_against_pte_lookup(struct mm_struct *mm)
-{
-	smp_mb();
-	smp_call_function_many(mm_cpumask(mm), do_serialize, mm, 1);
-}
-
 /*
  * We use this to invalidate a pmdp entry before switching from a
  * hugepte to regular pmd entry.
-- 
2.53.0



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [RFC v1 06/10] powerpc/64s: Kill the unused argument of exit_lazy_flush_tlb
  2026-02-25 11:04 [RFC v1 00/10] Misc powerpc fixes and refactoring Ritesh Harjani (IBM)
                   ` (3 preceding siblings ...)
  2026-02-25 11:04 ` [RFC v1 05/10] powerpc/64s: Move serialize_against_pte_lookup() to hash_pgtable.c Ritesh Harjani (IBM)
@ 2026-02-25 11:04 ` Ritesh Harjani (IBM)
  2026-02-25 11:04 ` [RFC v1 07/10] powerpc: book3s64: Rename tlbie_va_lpid to tlbie_va_pid_lpid Ritesh Harjani (IBM)
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 12+ messages in thread
From: Ritesh Harjani (IBM) @ 2026-02-25 11:04 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-mm, Hugh Dickins, Andrew Morton, Madhavan Srinivasan,
	Nicholas Piggin, Aneesh Kumar K . V, Christophe Leroy,
	Venkat Rao Bagalkote, Ritesh Harjani (IBM)

In previous patch we removed the only caller of exit_lazy_flush_tlb()
which was passing always_flush = false in it's second argument.

With that gone, all the callers of exit_lazy_flush_tlb() are local to
radix_pgtable.c and there is no need of an additional argument.

This patch does the required cleanup. There should not be any
functionality change in this patch.

Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
 arch/powerpc/mm/book3s64/internal.h  |  2 --
 arch/powerpc/mm/book3s64/pgtable.c   |  2 --
 arch/powerpc/mm/book3s64/radix_tlb.c | 14 +++++---------
 3 files changed, 5 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/mm/book3s64/internal.h b/arch/powerpc/mm/book3s64/internal.h
index cad08d83369c..f7055251c8b7 100644
--- a/arch/powerpc/mm/book3s64/internal.h
+++ b/arch/powerpc/mm/book3s64/internal.h
@@ -31,6 +31,4 @@ static inline bool slb_preload_disabled(void)
 
 void hpt_do_stress(unsigned long ea, unsigned long hpte_group);
 
-void exit_lazy_flush_tlb(struct mm_struct *mm, bool always_flush);
-
 #endif /* ARCH_POWERPC_MM_BOOK3S64_INTERNAL_H */
diff --git a/arch/powerpc/mm/book3s64/pgtable.c b/arch/powerpc/mm/book3s64/pgtable.c
index 84284dff650a..52d3e0c4a030 100644
--- a/arch/powerpc/mm/book3s64/pgtable.c
+++ b/arch/powerpc/mm/book3s64/pgtable.c
@@ -23,8 +23,6 @@
 #include <mm/mmu_decl.h>
 #include <trace/events/thp.h>
 
-#include "internal.h"
-
 struct mmu_psize_def mmu_psize_defs[MMU_PAGE_COUNT];
 EXPORT_SYMBOL_GPL(mmu_psize_defs);
 
diff --git a/arch/powerpc/mm/book3s64/radix_tlb.c b/arch/powerpc/mm/book3s64/radix_tlb.c
index 9e1f6558d026..339bd276840b 100644
--- a/arch/powerpc/mm/book3s64/radix_tlb.c
+++ b/arch/powerpc/mm/book3s64/radix_tlb.c
@@ -19,8 +19,6 @@
 #include <asm/cputhreads.h>
 #include <asm/plpar_wrappers.h>
 
-#include "internal.h"
-
 /*
  * tlbiel instruction for radix, set invalidation
  * i.e., r=1 and is=01 or is=10 or is=11
@@ -660,7 +658,7 @@ static bool mm_needs_flush_escalation(struct mm_struct *mm)
  * If always_flush is true, then flush even if this CPU can't be removed
  * from mm_cpumask.
  */
-void exit_lazy_flush_tlb(struct mm_struct *mm, bool always_flush)
+static void exit_lazy_flush_tlb(struct mm_struct *mm)
 {
 	unsigned long pid = mm->context.id;
 	int cpu = smp_processor_id();
@@ -703,19 +701,17 @@ void exit_lazy_flush_tlb(struct mm_struct *mm, bool always_flush)
 	if (cpumask_test_cpu(cpu, mm_cpumask(mm))) {
 		dec_mm_active_cpus(mm);
 		cpumask_clear_cpu(cpu, mm_cpumask(mm));
-		always_flush = true;
 	}
 
 out:
-	if (always_flush)
-		_tlbiel_pid(pid, RIC_FLUSH_ALL);
+	_tlbiel_pid(pid, RIC_FLUSH_ALL);
 }
 
 #ifdef CONFIG_SMP
 static void do_exit_flush_lazy_tlb(void *arg)
 {
 	struct mm_struct *mm = arg;
-	exit_lazy_flush_tlb(mm, true);
+	exit_lazy_flush_tlb(mm);
 }
 
 static void exit_flush_lazy_tlbs(struct mm_struct *mm)
@@ -777,7 +773,7 @@ static enum tlb_flush_type flush_type_needed(struct mm_struct *mm, bool fullmm)
 			 * to trim.
 			 */
 			if (tick_and_test_trim_clock()) {
-				exit_lazy_flush_tlb(mm, true);
+				exit_lazy_flush_tlb(mm);
 				return FLUSH_TYPE_NONE;
 			}
 		}
@@ -823,7 +819,7 @@ static enum tlb_flush_type flush_type_needed(struct mm_struct *mm, bool fullmm)
 		if (current->mm == mm)
 			return FLUSH_TYPE_LOCAL;
 		if (cpumask_test_cpu(cpu, mm_cpumask(mm)))
-			exit_lazy_flush_tlb(mm, true);
+			exit_lazy_flush_tlb(mm);
 		return FLUSH_TYPE_NONE;
 	}
 
-- 
2.53.0



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [RFC v1 07/10] powerpc: book3s64: Rename tlbie_va_lpid to tlbie_va_pid_lpid
  2026-02-25 11:04 [RFC v1 00/10] Misc powerpc fixes and refactoring Ritesh Harjani (IBM)
                   ` (4 preceding siblings ...)
  2026-02-25 11:04 ` [RFC v1 06/10] powerpc/64s: Kill the unused argument of exit_lazy_flush_tlb Ritesh Harjani (IBM)
@ 2026-02-25 11:04 ` Ritesh Harjani (IBM)
  2026-02-25 11:04 ` [RFC v1 08/10] powerpc: book3s64: Rename tlbie_lpid_va to tlbie_va_lpid Ritesh Harjani (IBM)
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 12+ messages in thread
From: Ritesh Harjani (IBM) @ 2026-02-25 11:04 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-mm, Hugh Dickins, Andrew Morton, Madhavan Srinivasan,
	Nicholas Piggin, Aneesh Kumar K . V, Christophe Leroy,
	Venkat Rao Bagalkote, Ritesh Harjani (IBM)

It only make sense to rename these functions, so it's better reflect what
they are supposed to do. For e.g. __tlbie_va_pid_lpid name better reflect
that it is invalidating tlbie using VA, PID and LPID.

No functional change in this patch.

Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
 arch/powerpc/mm/book3s64/radix_tlb.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/mm/book3s64/radix_tlb.c b/arch/powerpc/mm/book3s64/radix_tlb.c
index 339bd276840b..1adf20798ca6 100644
--- a/arch/powerpc/mm/book3s64/radix_tlb.c
+++ b/arch/powerpc/mm/book3s64/radix_tlb.c
@@ -1411,7 +1411,7 @@ static __always_inline void __tlbie_pid_lpid(unsigned long pid,
 	trace_tlbie(0, 0, rb, rs, ric, prs, r);
 }
 
-static __always_inline void __tlbie_va_lpid(unsigned long va, unsigned long pid,
+static __always_inline void __tlbie_va_pid_lpid(unsigned long va, unsigned long pid,
 					    unsigned long lpid,
 					    unsigned long ap, unsigned long ric)
 {
@@ -1443,7 +1443,7 @@ static inline void fixup_tlbie_pid_lpid(unsigned long pid, unsigned long lpid)
 
 	if (cpu_has_feature(CPU_FTR_P9_TLBIE_STQ_BUG)) {
 		asm volatile("ptesync" : : : "memory");
-		__tlbie_va_lpid(va, pid, lpid, mmu_get_ap(MMU_PAGE_64K),
+		__tlbie_va_pid_lpid(va, pid, lpid, mmu_get_ap(MMU_PAGE_64K),
 				RIC_FLUSH_TLB);
 	}
 }
@@ -1474,7 +1474,7 @@ static inline void _tlbie_pid_lpid(unsigned long pid, unsigned long lpid,
 	asm volatile("eieio; tlbsync; ptesync" : : : "memory");
 }
 
-static inline void fixup_tlbie_va_range_lpid(unsigned long va,
+static inline void fixup_tlbie_va_range_pid_lpid(unsigned long va,
 					     unsigned long pid,
 					     unsigned long lpid,
 					     unsigned long ap)
@@ -1486,11 +1486,11 @@ static inline void fixup_tlbie_va_range_lpid(unsigned long va,
 
 	if (cpu_has_feature(CPU_FTR_P9_TLBIE_STQ_BUG)) {
 		asm volatile("ptesync" : : : "memory");
-		__tlbie_va_lpid(va, pid, lpid, ap, RIC_FLUSH_TLB);
+		__tlbie_va_pid_lpid(va, pid, lpid, ap, RIC_FLUSH_TLB);
 	}
 }
 
-static inline void __tlbie_va_range_lpid(unsigned long start, unsigned long end,
+static inline void __tlbie_va_range_pid_lpid(unsigned long start, unsigned long end,
 					 unsigned long pid, unsigned long lpid,
 					 unsigned long page_size,
 					 unsigned long psize)
@@ -1499,12 +1499,12 @@ static inline void __tlbie_va_range_lpid(unsigned long start, unsigned long end,
 	unsigned long ap = mmu_get_ap(psize);
 
 	for (addr = start; addr < end; addr += page_size)
-		__tlbie_va_lpid(addr, pid, lpid, ap, RIC_FLUSH_TLB);
+		__tlbie_va_pid_lpid(addr, pid, lpid, ap, RIC_FLUSH_TLB);
 
-	fixup_tlbie_va_range_lpid(addr - page_size, pid, lpid, ap);
+	fixup_tlbie_va_range_pid_lpid(addr - page_size, pid, lpid, ap);
 }
 
-static inline void _tlbie_va_range_lpid(unsigned long start, unsigned long end,
+static inline void _tlbie_va_range_pid_lpid(unsigned long start, unsigned long end,
 					unsigned long pid, unsigned long lpid,
 					unsigned long page_size,
 					unsigned long psize, bool also_pwc)
@@ -1512,7 +1512,7 @@ static inline void _tlbie_va_range_lpid(unsigned long start, unsigned long end,
 	asm volatile("ptesync" : : : "memory");
 	if (also_pwc)
 		__tlbie_pid_lpid(pid, lpid, RIC_FLUSH_PWC);
-	__tlbie_va_range_lpid(start, end, pid, lpid, page_size, psize);
+	__tlbie_va_range_pid_lpid(start, end, pid, lpid, page_size, psize);
 	asm volatile("eieio; tlbsync; ptesync" : : : "memory");
 }
 
@@ -1563,7 +1563,7 @@ void do_h_rpt_invalidate_prt(unsigned long pid, unsigned long lpid,
 			_tlbie_pid_lpid(pid, lpid, RIC_FLUSH_TLB);
 			return;
 		}
-		_tlbie_va_range_lpid(start, end, pid, lpid,
+		_tlbie_va_range_pid_lpid(start, end, pid, lpid,
 				     (1UL << def->shift), psize, false);
 	}
 }
-- 
2.53.0



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [RFC v1 08/10] powerpc: book3s64: Rename tlbie_lpid_va to tlbie_va_lpid
  2026-02-25 11:04 [RFC v1 00/10] Misc powerpc fixes and refactoring Ritesh Harjani (IBM)
                   ` (5 preceding siblings ...)
  2026-02-25 11:04 ` [RFC v1 07/10] powerpc: book3s64: Rename tlbie_va_lpid to tlbie_va_pid_lpid Ritesh Harjani (IBM)
@ 2026-02-25 11:04 ` Ritesh Harjani (IBM)
  2026-02-25 11:04 ` [RFC v1 09/10] powerpc: book3s64: Make use of H_RPTI_TYPE_ALL macro Ritesh Harjani (IBM)
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 12+ messages in thread
From: Ritesh Harjani (IBM) @ 2026-02-25 11:04 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-mm, Hugh Dickins, Andrew Morton, Madhavan Srinivasan,
	Nicholas Piggin, Aneesh Kumar K . V, Christophe Leroy,
	Venkat Rao Bagalkote, Ritesh Harjani (IBM)

In previous patch we renamed tlbie_va_lpid functions to
tlbie_va_pid_lpid() since those were working with PIDs as well.
This then allows us to rename tlbie_lpid_va to tlbie_va_lpid, which
finally makes all the tlbie function naming consistent.

No functional change in this patch.

Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
 arch/powerpc/mm/book3s64/radix_tlb.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/mm/book3s64/radix_tlb.c b/arch/powerpc/mm/book3s64/radix_tlb.c
index 1adf20798ca6..6ce94eaefc1b 100644
--- a/arch/powerpc/mm/book3s64/radix_tlb.c
+++ b/arch/powerpc/mm/book3s64/radix_tlb.c
@@ -185,7 +185,7 @@ static __always_inline void __tlbie_va(unsigned long va, unsigned long pid,
 	trace_tlbie(0, 0, rb, rs, ric, prs, r);
 }
 
-static __always_inline void __tlbie_lpid_va(unsigned long va, unsigned long lpid,
+static __always_inline void __tlbie_va_lpid(unsigned long va, unsigned long lpid,
 					    unsigned long ap, unsigned long ric)
 {
 	unsigned long rb,rs,prs,r;
@@ -249,17 +249,17 @@ static inline void fixup_tlbie_pid(unsigned long pid)
 	}
 }
 
-static inline void fixup_tlbie_lpid_va(unsigned long va, unsigned long lpid,
+static inline void fixup_tlbie_va_lpid(unsigned long va, unsigned long lpid,
 				       unsigned long ap)
 {
 	if (cpu_has_feature(CPU_FTR_P9_TLBIE_ERAT_BUG)) {
 		asm volatile("ptesync": : :"memory");
-		__tlbie_lpid_va(va, 0, ap, RIC_FLUSH_TLB);
+		__tlbie_va_lpid(va, 0, ap, RIC_FLUSH_TLB);
 	}
 
 	if (cpu_has_feature(CPU_FTR_P9_TLBIE_STQ_BUG)) {
 		asm volatile("ptesync": : :"memory");
-		__tlbie_lpid_va(va, lpid, ap, RIC_FLUSH_TLB);
+		__tlbie_va_lpid(va, lpid, ap, RIC_FLUSH_TLB);
 	}
 }
 
@@ -278,7 +278,7 @@ static inline void fixup_tlbie_lpid(unsigned long lpid)
 
 	if (cpu_has_feature(CPU_FTR_P9_TLBIE_STQ_BUG)) {
 		asm volatile("ptesync": : :"memory");
-		__tlbie_lpid_va(va, lpid, mmu_get_ap(MMU_PAGE_64K), RIC_FLUSH_TLB);
+		__tlbie_va_lpid(va, lpid, mmu_get_ap(MMU_PAGE_64K), RIC_FLUSH_TLB);
 	}
 }
 
@@ -529,14 +529,14 @@ static void do_tlbiel_va_range(void *info)
 				    t->psize, t->also_pwc);
 }
 
-static __always_inline void _tlbie_lpid_va(unsigned long va, unsigned long lpid,
+static __always_inline void _tlbie_va_lpid(unsigned long va, unsigned long lpid,
 			      unsigned long psize, unsigned long ric)
 {
 	unsigned long ap = mmu_get_ap(psize);
 
 	asm volatile("ptesync": : :"memory");
-	__tlbie_lpid_va(va, lpid, ap, ric);
-	fixup_tlbie_lpid_va(va, lpid, ap);
+	__tlbie_va_lpid(va, lpid, ap, ric);
+	fixup_tlbie_va_lpid(va, lpid, ap);
 	asm volatile("eieio; tlbsync; ptesync": : :"memory");
 }
 
@@ -1147,7 +1147,7 @@ void radix__flush_tlb_lpid_page(unsigned int lpid,
 {
 	int psize = radix_get_mmu_psize(page_size);
 
-	_tlbie_lpid_va(addr, lpid, psize, RIC_FLUSH_TLB);
+	_tlbie_va_lpid(addr, lpid, psize, RIC_FLUSH_TLB);
 }
 EXPORT_SYMBOL_GPL(radix__flush_tlb_lpid_page);
 
-- 
2.53.0



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [RFC v1 09/10] powerpc: book3s64: Make use of H_RPTI_TYPE_ALL macro
  2026-02-25 11:04 [RFC v1 00/10] Misc powerpc fixes and refactoring Ritesh Harjani (IBM)
                   ` (6 preceding siblings ...)
  2026-02-25 11:04 ` [RFC v1 08/10] powerpc: book3s64: Rename tlbie_lpid_va to tlbie_va_lpid Ritesh Harjani (IBM)
@ 2026-02-25 11:04 ` Ritesh Harjani (IBM)
  2026-02-25 11:04 ` [RFC v1 10/10] powerpc: Add MMU_FTRS_POSSIBLE & MMU_FTRS_ALWAYS Ritesh Harjani (IBM)
  2026-02-25 11:42 ` [RFC v1 01/10] powerpc/pgtable-frag: Fix bad page state in pte_frag_destroy Ritesh Harjani
  9 siblings, 0 replies; 12+ messages in thread
From: Ritesh Harjani (IBM) @ 2026-02-25 11:04 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-mm, Hugh Dickins, Andrew Morton, Madhavan Srinivasan,
	Nicholas Piggin, Aneesh Kumar K . V, Christophe Leroy,
	Venkat Rao Bagalkote, Ritesh Harjani (IBM)

Instead of opencoding, let's use the pre-defined macro (H_RPTI_TYPE_ALL)
at the following places.

Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
 arch/powerpc/mm/book3s64/radix_tlb.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/mm/book3s64/radix_tlb.c b/arch/powerpc/mm/book3s64/radix_tlb.c
index 6ce94eaefc1b..7de5760164a9 100644
--- a/arch/powerpc/mm/book3s64/radix_tlb.c
+++ b/arch/powerpc/mm/book3s64/radix_tlb.c
@@ -885,8 +885,7 @@ static void __flush_all_mm(struct mm_struct *mm, bool fullmm)
 	} else if (type == FLUSH_TYPE_GLOBAL) {
 		if (!mmu_has_feature(MMU_FTR_GTSE)) {
 			unsigned long tgt = H_RPTI_TARGET_CMMU;
-			unsigned long type = H_RPTI_TYPE_TLB | H_RPTI_TYPE_PWC |
-					     H_RPTI_TYPE_PRT;
+			unsigned long type = H_RPTI_TYPE_ALL;
 
 			if (atomic_read(&mm->context.copros) > 0)
 				tgt |= H_RPTI_TARGET_NMMU;
@@ -982,8 +981,7 @@ void radix__flush_tlb_kernel_range(unsigned long start, unsigned long end)
 {
 	if (!mmu_has_feature(MMU_FTR_GTSE)) {
 		unsigned long tgt = H_RPTI_TARGET_CMMU | H_RPTI_TARGET_NMMU;
-		unsigned long type = H_RPTI_TYPE_TLB | H_RPTI_TYPE_PWC |
-				     H_RPTI_TYPE_PRT;
+		unsigned long type = H_RPTI_TYPE_ALL;
 
 		pseries_rpt_invalidate(0, tgt, type, H_RPTI_PAGE_ALL,
 				       start, end);
@@ -1337,8 +1335,7 @@ void radix__flush_tlb_collapsed_pmd(struct mm_struct *mm, unsigned long addr)
 			unsigned long tgt, type, pg_sizes;
 
 			tgt = H_RPTI_TARGET_CMMU;
-			type = H_RPTI_TYPE_TLB | H_RPTI_TYPE_PWC |
-			       H_RPTI_TYPE_PRT;
+			type = H_RPTI_TYPE_ALL;
 			pg_sizes = psize_to_rpti_pgsize(mmu_virtual_psize);
 
 			if (atomic_read(&mm->context.copros) > 0)
-- 
2.53.0



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [RFC v1 10/10] powerpc: Add MMU_FTRS_POSSIBLE & MMU_FTRS_ALWAYS
  2026-02-25 11:04 [RFC v1 00/10] Misc powerpc fixes and refactoring Ritesh Harjani (IBM)
                   ` (7 preceding siblings ...)
  2026-02-25 11:04 ` [RFC v1 09/10] powerpc: book3s64: Make use of H_RPTI_TYPE_ALL macro Ritesh Harjani (IBM)
@ 2026-02-25 11:04 ` Ritesh Harjani (IBM)
  2026-02-25 11:42 ` [RFC v1 01/10] powerpc/pgtable-frag: Fix bad page state in pte_frag_destroy Ritesh Harjani
  9 siblings, 0 replies; 12+ messages in thread
From: Ritesh Harjani (IBM) @ 2026-02-25 11:04 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-mm, Hugh Dickins, Andrew Morton, Madhavan Srinivasan,
	Nicholas Piggin, Aneesh Kumar K . V, Christophe Leroy,
	Venkat Rao Bagalkote, Ritesh Harjani (IBM)

Similar to CPU_FTRS_[POSSIBLE|ALWAYS], let's also print
MMU_FTRS_[ALWAYS|ALWAYS]. This has some useful data to capture during
bootup.

Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
 arch/powerpc/kernel/setup-common.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c
index cb5b73adc250..002b312eb7e9 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -866,6 +866,10 @@ static __init void print_system_info(void)
 		cur_cpu_spec->cpu_user_features,
 		cur_cpu_spec->cpu_user_features2);
 	pr_info("mmu_features      = 0x%08x\n", cur_cpu_spec->mmu_features);
+	pr_info("  possible        = 0x%016lx\n",
+		(unsigned long)MMU_FTRS_POSSIBLE);
+	pr_info("  always          = 0x%016lx\n",
+		(unsigned long)MMU_FTRS_ALWAYS);
 #ifdef CONFIG_PPC64
 	pr_info("firmware_features = 0x%016lx\n", powerpc_firmware_features);
 #ifdef CONFIG_PPC_BOOK3S
--
2.53.0



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [RFC v1 01/10] powerpc/pgtable-frag: Fix bad page state in pte_frag_destroy
  2026-02-25 11:04 [RFC v1 00/10] Misc powerpc fixes and refactoring Ritesh Harjani (IBM)
                   ` (8 preceding siblings ...)
  2026-02-25 11:04 ` [RFC v1 10/10] powerpc: Add MMU_FTRS_POSSIBLE & MMU_FTRS_ALWAYS Ritesh Harjani (IBM)
@ 2026-02-25 11:42 ` Ritesh Harjani
  2026-02-25 11:04   ` Ritesh Harjani (IBM)
  9 siblings, 1 reply; 12+ messages in thread
From: Ritesh Harjani @ 2026-02-25 11:42 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-mm, Hugh Dickins, Andrew Morton, Madhavan Srinivasan,
	Nicholas Piggin, Aneesh Kumar K . V, Christophe Leroy,
	Venkat Rao Bagalkote, Ritesh Harjani (IBM)

From: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com>

powerpc uses pt_frag_refcount as a reference counter for tracking it's
pte and pmd page table fragments. For PTE table, in case of Hash with
64K pagesize, we have 16 fragments of 4K size in one 64K page.

Patch series [1] "mm: free retracted page table by RCU"
added pte_free_defer() to defer the freeing of PTE tables when
retract_page_tables() is called for madvise MADV_COLLAPSE on shmem
range.
[1]: https://lore.kernel.org/all/7cd843a9-aa80-14f-5eb2-33427363c20@google.com/

pte_free_defer() sets the active flag on the corresponding fragment's
folio & calls pte_fragment_free(), which reduces the pt_frag_refcount.
When pt_frag_refcount reaches 0 (no active fragment using the folio), it
checks if the folio active flag is set, if set, it calls call_rcu to
free the folio, it the active flag is unset then it calls pte_free_now().

Now, this can lead to following problem in a corner case...

[  265.351553][  T183] BUG: Bad page state in process a.out  pfn:20d62
[  265.353555][  T183] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x20d62
[  265.355457][  T183] flags: 0x3ffff800000100(active|node=0|zone=0|lastcpupid=0x7ffff)
[  265.358719][  T183] raw: 003ffff800000100 0000000000000000 5deadbeef0000122 0000000000000000
[  265.360177][  T183] raw: 0000000000000000 c0000000119caf58 00000000ffffffff 0000000000000000
[  265.361438][  T183] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
[  265.362572][  T183] Modules linked in:
[  265.364622][  T183] CPU: 0 UID: 0 PID: 183 Comm: a.out Not tainted 6.18.0-rc3-00141-g1ddeaaace7ff-dirty #53 VOLUNTARY
[  265.364785][  T183] Hardware name: IBM pSeries (emulated by qemu) POWER10 (architected) 0x801200 0xf000006 of:SLOF,git-ee03ae pSeries
[  265.364908][  T183] Call Trace:
[  265.364955][  T183] [c000000011e6f7c0] [c000000001cfaa18] dump_stack_lvl+0x130/0x148 (unreliable)
[  265.365202][  T183] [c000000011e6f7f0] [c000000000794758] bad_page+0xb4/0x1c8
[  265.365384][  T183] [c000000011e6f890] [c00000000079c020] __free_frozen_pages+0x838/0xd08
[  265.365554][  T183] [c000000011e6f980] [c0000000000a70ac] pte_frag_destroy+0x298/0x310
[  265.365729][  T183] [c000000011e6fa30] [c0000000000aa764] arch_exit_mmap+0x34/0x218
[  265.365912][  T183] [c000000011e6fa80] [c000000000751698] exit_mmap+0xb8/0x820
[  265.366080][  T183] [c000000011e6fc30] [c0000000001b1258] __mmput+0x98/0x300
[  265.366244][  T183] [c000000011e6fc80] [c0000000001c81f8] do_exit+0x470/0x1508
[  265.366421][  T183] [c000000011e6fd70] [c0000000001c95e4] do_group_exit+0x88/0x148
[  265.366602][  T183] [c000000011e6fdc0] [c0000000001c96ec] pid_child_should_wake+0x0/0x178
[  265.366780][  T183] [c000000011e6fdf0] [c00000000003a270] system_call_exception+0x1b0/0x4e0
[  265.366958][  T183] [c000000011e6fe50] [c00000000000d05c] system_call_vectored_common+0x15c/0x2ec

The bad page state error occurs when such a folio gets freed (with
active flag set), from do_exit() path in parallel.

... this can happen when the pte fragment was allocated from this folio,
but when all the fragments get freed, the pte_frag_refcount still had some
unused fragments. Now, if this process exits, with such folio as it's cached
pte_frag in mm->context, then during pte_frag_destroy(), we simply call
pagetable_dtor() and pagetable_free(), meaning it doesn't clear the
active flag. This, can lead to the above bug. Since we are anyway in
do_exit() path, then if the refcount is 0, then I guess it should be
ok to simply clear the folio active flag before calling pagetable_dtor()
& pagetable_free().

Fixes: 32cc0b7c9d50 ("powerpc: add pte_free_defer() for pgtables sharing page")
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
 arch/powerpc/mm/pgtable-frag.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/mm/pgtable-frag.c b/arch/powerpc/mm/pgtable-frag.c
index 77e55eac16e4..ae742564a3d5 100644
--- a/arch/powerpc/mm/pgtable-frag.c
+++ b/arch/powerpc/mm/pgtable-frag.c
@@ -25,6 +25,7 @@ void pte_frag_destroy(void *pte_frag)
 	count = ((unsigned long)pte_frag & ~PAGE_MASK) >> PTE_FRAG_SIZE_SHIFT;
 	/* We allow PTE_FRAG_NR fragments from a PTE page */
 	if (atomic_sub_and_test(PTE_FRAG_NR - count, &ptdesc->pt_frag_refcount)) {
+		folio_clear_active(ptdesc_folio(ptdesc));
 		pagetable_dtor(ptdesc);
 		pagetable_free(ptdesc);
 	}
-- 
2.53.0



^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2026-02-25 13:13 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-02-25 11:04 [RFC v1 00/10] Misc powerpc fixes and refactoring Ritesh Harjani (IBM)
2026-02-25 11:04 ` [RFC v1 02/10] powerpc: book3s64: Fix unmap race with PMD THP migration entry Ritesh Harjani (IBM)
2026-02-25 11:04 ` [RFC v1 03/10] mm/debug_vm_pgtable.c: Add test to zap " Ritesh Harjani (IBM)
2026-02-25 11:04 ` [RFC v1 04/10] powerpc/64s/tlbflush-radix: Remove unused radix__flush_tlb_pwc() Ritesh Harjani (IBM)
2026-02-25 11:04 ` [RFC v1 05/10] powerpc/64s: Move serialize_against_pte_lookup() to hash_pgtable.c Ritesh Harjani (IBM)
2026-02-25 11:04 ` [RFC v1 06/10] powerpc/64s: Kill the unused argument of exit_lazy_flush_tlb Ritesh Harjani (IBM)
2026-02-25 11:04 ` [RFC v1 07/10] powerpc: book3s64: Rename tlbie_va_lpid to tlbie_va_pid_lpid Ritesh Harjani (IBM)
2026-02-25 11:04 ` [RFC v1 08/10] powerpc: book3s64: Rename tlbie_lpid_va to tlbie_va_lpid Ritesh Harjani (IBM)
2026-02-25 11:04 ` [RFC v1 09/10] powerpc: book3s64: Make use of H_RPTI_TYPE_ALL macro Ritesh Harjani (IBM)
2026-02-25 11:04 ` [RFC v1 10/10] powerpc: Add MMU_FTRS_POSSIBLE & MMU_FTRS_ALWAYS Ritesh Harjani (IBM)
2026-02-25 11:42 ` [RFC v1 01/10] powerpc/pgtable-frag: Fix bad page state in pte_frag_destroy Ritesh Harjani
2026-02-25 11:04   ` Ritesh Harjani (IBM)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox