* [PATCH v3 0/7] enable PT_RECLAIM on all 64-bit architectures
@ 2025-12-17 9:45 Qi Zheng
2025-12-17 9:45 ` [PATCH v3 1/7] mm: change mm/pt_reclaim.c to use asm/tlb.h instead of asm-generic/tlb.h Qi Zheng
` (6 more replies)
0 siblings, 7 replies; 23+ messages in thread
From: Qi Zheng @ 2025-12-17 9:45 UTC (permalink / raw)
To: will, aneesh.kumar, npiggin, peterz, dev.jain, akpm, david,
ioworker0, linmag7
Cc: linux-arch, linux-kernel, linux-mm, linux-alpha, loongarch,
linux-mips, linux-parisc, linux-um, Qi Zheng
From: Qi Zheng <zhengqi.arch@bytedance.com>
Changes in v3:
- modify the commit message in [PATCH v3 1/7] (suggested by David Hildenbrand)
- make PT_RECLAIM depends on MMU_GATHER_RCU_TABLE_FREE instead of 64BIT
- collect Acked-by
- rebase onto the next-20251217
Changelog in v2:
- fix compilation errors (reported by Magnus Lindholm and kernel test robot)
- adjust some code style (suggested by Huacai Chen)
- make PT_RECLAIM user-unselectable (suggested by David Hildenbrand)
- rebase onto the next-20251119
Hi all,
This series aims to enable PT_RECLAIM on all 64-bit architectures.
On a 64-bit system, madvise(MADV_DONTNEED) may cause a large number of empty PTE
page table pages (such as 100GB+). To resolve this problem, we need to enable
PT_RECLAIM, which depends on MMU_GATHER_RCU_TABLE_FREE.
Therefore, this series first enables MMU_GATHER_RCU_TABLE_FREE on all 64-bit
architectures, and finally makes PT_RECLAIM depend on MMU_GATHER_RCU_TABLE_FREE.
This way, PT_RECLAIM can be enabled by default on all 64-bit architectures.
Of course, this will also be enabled on some 32-bit architectures that already
support MMU_GATHER_RCU_TABLE_FREE. That's fine, PT_RECLAIM works well on all
32-bit architectures as well. Although the benefit isn't significant, there's
still memory that can be reclaimed. Perhaps PT_RECLAIM can be enabled on all
32-bit architectures in the future.
Comments and suggestions are welcome!
Thanks,
Qi
Qi Zheng (7):
mm: change mm/pt_reclaim.c to use asm/tlb.h instead of
asm-generic/tlb.h
alpha: mm: enable MMU_GATHER_RCU_TABLE_FREE
LoongArch: mm: enable MMU_GATHER_RCU_TABLE_FREE
mips: mm: enable MMU_GATHER_RCU_TABLE_FREE
parisc: mm: enable MMU_GATHER_RCU_TABLE_FREE
um: mm: enable MMU_GATHER_RCU_TABLE_FREE
mm: make PT_RECLAIM depends on MMU_GATHER_RCU_TABLE_FREE
arch/alpha/Kconfig | 1 +
arch/alpha/include/asm/tlb.h | 6 +++---
arch/loongarch/Kconfig | 1 +
arch/loongarch/include/asm/pgalloc.h | 7 +++----
arch/mips/Kconfig | 1 +
arch/mips/include/asm/pgalloc.h | 7 +++----
arch/parisc/Kconfig | 1 +
arch/parisc/include/asm/tlb.h | 4 ++--
arch/um/Kconfig | 1 +
arch/x86/Kconfig | 1 -
mm/Kconfig | 9 ++-------
mm/pt_reclaim.c | 2 +-
12 files changed, 19 insertions(+), 22 deletions(-)
--
2.20.1
^ permalink raw reply [flat|nested] 23+ messages in thread* [PATCH v3 1/7] mm: change mm/pt_reclaim.c to use asm/tlb.h instead of asm-generic/tlb.h 2025-12-17 9:45 [PATCH v3 0/7] enable PT_RECLAIM on all 64-bit architectures Qi Zheng @ 2025-12-17 9:45 ` Qi Zheng 2025-12-17 9:45 ` [PATCH v3 2/7] alpha: mm: enable MMU_GATHER_RCU_TABLE_FREE Qi Zheng ` (5 subsequent siblings) 6 siblings, 0 replies; 23+ messages in thread From: Qi Zheng @ 2025-12-17 9:45 UTC (permalink / raw) To: will, aneesh.kumar, npiggin, peterz, dev.jain, akpm, david, ioworker0, linmag7 Cc: linux-arch, linux-kernel, linux-mm, linux-alpha, loongarch, linux-mips, linux-parisc, linux-um, Qi Zheng From: Qi Zheng <zhengqi.arch@bytedance.com> Generally, the asm/tlb.h will include asm-generic/tlb.h, so change mm/pt_reclaim.c to use asm/tlb.h instead of asm-generic/tlb.h. This is a preparation for enabling CONFIG_PT_RECLAIM on other architectures, such as alpha. Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Acked-by: David Hildenbrand (Red Hat) <david@kernel.org> --- mm/pt_reclaim.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/pt_reclaim.c b/mm/pt_reclaim.c index 0d9cfbf4fe5d8..46771cfff8239 100644 --- a/mm/pt_reclaim.c +++ b/mm/pt_reclaim.c @@ -2,7 +2,7 @@ #include <linux/hugetlb.h> #include <linux/pgalloc.h> -#include <asm-generic/tlb.h> +#include <asm/tlb.h> #include "internal.h" -- 2.20.1 ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH v3 2/7] alpha: mm: enable MMU_GATHER_RCU_TABLE_FREE 2025-12-17 9:45 [PATCH v3 0/7] enable PT_RECLAIM on all 64-bit architectures Qi Zheng 2025-12-17 9:45 ` [PATCH v3 1/7] mm: change mm/pt_reclaim.c to use asm/tlb.h instead of asm-generic/tlb.h Qi Zheng @ 2025-12-17 9:45 ` Qi Zheng 2025-12-17 9:45 ` [PATCH v3 3/7] LoongArch: " Qi Zheng ` (4 subsequent siblings) 6 siblings, 0 replies; 23+ messages in thread From: Qi Zheng @ 2025-12-17 9:45 UTC (permalink / raw) To: will, aneesh.kumar, npiggin, peterz, dev.jain, akpm, david, ioworker0, linmag7 Cc: linux-arch, linux-kernel, linux-mm, linux-alpha, loongarch, linux-mips, linux-parisc, linux-um, Qi Zheng, Richard Henderson, Matt Turner From: Qi Zheng <zhengqi.arch@bytedance.com> On a 64-bit system, madvise(MADV_DONTNEED) may cause a large number of empty PTE page table pages (such as 100GB+). To resolve this problem, first enable MMU_GATHER_RCU_TABLE_FREE to prepare for enabling the PT_RECLAIM feature, which resolves this problem. Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Richard Henderson <richard.henderson@linaro.org> Cc: Matt Turner <mattst88@gmail.com> --- arch/alpha/Kconfig | 1 + arch/alpha/include/asm/tlb.h | 6 +++--- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/arch/alpha/Kconfig b/arch/alpha/Kconfig index 80367f2cf821c..6c7dbf0adad62 100644 --- a/arch/alpha/Kconfig +++ b/arch/alpha/Kconfig @@ -38,6 +38,7 @@ config ALPHA select OLD_SIGSUSPEND select CPU_NO_EFFICIENT_FFS if !ALPHA_EV67 select MMU_GATHER_NO_RANGE + select MMU_GATHER_RCU_TABLE_FREE select SPARSEMEM_EXTREME if SPARSEMEM select ZONE_DMA help diff --git a/arch/alpha/include/asm/tlb.h b/arch/alpha/include/asm/tlb.h index 4f79e331af5ea..ad586b898fd6b 100644 --- a/arch/alpha/include/asm/tlb.h +++ b/arch/alpha/include/asm/tlb.h @@ -4,7 +4,7 @@ #include <asm-generic/tlb.h> -#define __pte_free_tlb(tlb, pte, address) pte_free((tlb)->mm, pte) -#define __pmd_free_tlb(tlb, pmd, address) pmd_free((tlb)->mm, pmd) - +#define __pte_free_tlb(tlb, pte, address) tlb_remove_ptdesc((tlb), page_ptdesc(pte)) +#define __pmd_free_tlb(tlb, pmd, address) tlb_remove_ptdesc((tlb), virt_to_ptdesc(pmd)) + #endif -- 2.20.1 ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH v3 3/7] LoongArch: mm: enable MMU_GATHER_RCU_TABLE_FREE 2025-12-17 9:45 [PATCH v3 0/7] enable PT_RECLAIM on all 64-bit architectures Qi Zheng 2025-12-17 9:45 ` [PATCH v3 1/7] mm: change mm/pt_reclaim.c to use asm/tlb.h instead of asm-generic/tlb.h Qi Zheng 2025-12-17 9:45 ` [PATCH v3 2/7] alpha: mm: enable MMU_GATHER_RCU_TABLE_FREE Qi Zheng @ 2025-12-17 9:45 ` Qi Zheng 2025-12-17 9:45 ` [PATCH v3 4/7] mips: " Qi Zheng ` (3 subsequent siblings) 6 siblings, 0 replies; 23+ messages in thread From: Qi Zheng @ 2025-12-17 9:45 UTC (permalink / raw) To: will, aneesh.kumar, npiggin, peterz, dev.jain, akpm, david, ioworker0, linmag7 Cc: linux-arch, linux-kernel, linux-mm, linux-alpha, loongarch, linux-mips, linux-parisc, linux-um, Qi Zheng, Huacai Chen, WANG Xuerui From: Qi Zheng <zhengqi.arch@bytedance.com> On a 64-bit system, madvise(MADV_DONTNEED) may cause a large number of empty PTE page table pages (such as 100GB+). To resolve this problem, first enable MMU_GATHER_RCU_TABLE_FREE to prepare for enabling the PT_RECLAIM feature, which resolves this problem. Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: WANG Xuerui <kernel@xen0n.name> --- arch/loongarch/Kconfig | 1 + arch/loongarch/include/asm/pgalloc.h | 7 +++---- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig index 730f342145197..43d5b863e1fb2 100644 --- a/arch/loongarch/Kconfig +++ b/arch/loongarch/Kconfig @@ -187,6 +187,7 @@ config LOONGARCH select IRQ_LOONGARCH_CPU select LOCK_MM_AND_FIND_VMA select MMU_GATHER_MERGE_VMAS if MMU + select MMU_GATHER_RCU_TABLE_FREE select MODULES_USE_ELF_RELA if MODULES select NEED_PER_CPU_EMBED_FIRST_CHUNK select NEED_PER_CPU_PAGE_FIRST_CHUNK diff --git a/arch/loongarch/include/asm/pgalloc.h b/arch/loongarch/include/asm/pgalloc.h index 08dcc698ec184..248f62d0b590e 100644 --- a/arch/loongarch/include/asm/pgalloc.h +++ b/arch/loongarch/include/asm/pgalloc.h @@ -55,8 +55,7 @@ static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm) return pte; } -#define __pte_free_tlb(tlb, pte, address) \ - tlb_remove_ptdesc((tlb), page_ptdesc(pte)) +#define __pte_free_tlb(tlb, pte, address) tlb_remove_ptdesc((tlb), page_ptdesc(pte)) #ifndef __PAGETABLE_PMD_FOLDED @@ -79,7 +78,7 @@ static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long address) return pmd; } -#define __pmd_free_tlb(tlb, x, addr) pmd_free((tlb)->mm, x) +#define __pmd_free_tlb(tlb, x, addr) tlb_remove_ptdesc((tlb), virt_to_ptdesc(x)) #endif @@ -99,7 +98,7 @@ static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long address) return pud; } -#define __pud_free_tlb(tlb, x, addr) pud_free((tlb)->mm, x) +#define __pud_free_tlb(tlb, x, addr) tlb_remove_ptdesc((tlb), virt_to_ptdesc(x)) #endif /* __PAGETABLE_PUD_FOLDED */ -- 2.20.1 ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH v3 4/7] mips: mm: enable MMU_GATHER_RCU_TABLE_FREE 2025-12-17 9:45 [PATCH v3 0/7] enable PT_RECLAIM on all 64-bit architectures Qi Zheng ` (2 preceding siblings ...) 2025-12-17 9:45 ` [PATCH v3 3/7] LoongArch: " Qi Zheng @ 2025-12-17 9:45 ` Qi Zheng 2025-12-17 9:45 ` [PATCH v3 5/7] parisc: " Qi Zheng ` (2 subsequent siblings) 6 siblings, 0 replies; 23+ messages in thread From: Qi Zheng @ 2025-12-17 9:45 UTC (permalink / raw) To: will, aneesh.kumar, npiggin, peterz, dev.jain, akpm, david, ioworker0, linmag7 Cc: linux-arch, linux-kernel, linux-mm, linux-alpha, loongarch, linux-mips, linux-parisc, linux-um, Qi Zheng, Thomas Bogendoerfer From: Qi Zheng <zhengqi.arch@bytedance.com> On a 64-bit system, madvise(MADV_DONTNEED) may cause a large number of empty PTE page table pages (such as 100GB+). To resolve this problem, first enable MMU_GATHER_RCU_TABLE_FREE to prepare for enabling the PT_RECLAIM feature, which resolves this problem. Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> --- arch/mips/Kconfig | 1 + arch/mips/include/asm/pgalloc.h | 7 +++---- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig index b88b97139fa8e..c0c94e26ce396 100644 --- a/arch/mips/Kconfig +++ b/arch/mips/Kconfig @@ -99,6 +99,7 @@ config MIPS select IRQ_FORCED_THREADING select ISA if EISA select LOCK_MM_AND_FIND_VMA + select MMU_GATHER_RCU_TABLE_FREE select MODULES_USE_ELF_REL if MODULES select MODULES_USE_ELF_RELA if MODULES && 64BIT select PERF_USE_VMALLOC diff --git a/arch/mips/include/asm/pgalloc.h b/arch/mips/include/asm/pgalloc.h index 7a04381efa0b5..895bf79e76762 100644 --- a/arch/mips/include/asm/pgalloc.h +++ b/arch/mips/include/asm/pgalloc.h @@ -48,8 +48,7 @@ static inline void pud_populate(struct mm_struct *mm, pud_t *pud, pmd_t *pmd) extern void pgd_init(void *addr); extern pgd_t *pgd_alloc(struct mm_struct *mm); -#define __pte_free_tlb(tlb, pte, address) \ - tlb_remove_ptdesc((tlb), page_ptdesc(pte)) +#define __pte_free_tlb(tlb, pte, address) tlb_remove_ptdesc((tlb), page_ptdesc(pte)) #ifndef __PAGETABLE_PMD_FOLDED @@ -72,7 +71,7 @@ static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long address) return pmd; } -#define __pmd_free_tlb(tlb, x, addr) pmd_free((tlb)->mm, x) +#define __pmd_free_tlb(tlb, x, addr) tlb_remove_ptdesc((tlb), virt_to_ptdesc(x)) #endif @@ -97,7 +96,7 @@ static inline void p4d_populate(struct mm_struct *mm, p4d_t *p4d, pud_t *pud) set_p4d(p4d, __p4d((unsigned long)pud)); } -#define __pud_free_tlb(tlb, x, addr) pud_free((tlb)->mm, x) +#define __pud_free_tlb(tlb, x, addr) tlb_remove_ptdesc((tlb), virt_to_ptdesc(x)) #endif /* __PAGETABLE_PUD_FOLDED */ -- 2.20.1 ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH v3 5/7] parisc: mm: enable MMU_GATHER_RCU_TABLE_FREE 2025-12-17 9:45 [PATCH v3 0/7] enable PT_RECLAIM on all 64-bit architectures Qi Zheng ` (3 preceding siblings ...) 2025-12-17 9:45 ` [PATCH v3 4/7] mips: " Qi Zheng @ 2025-12-17 9:45 ` Qi Zheng 2025-12-17 9:45 ` [PATCH v3 6/7] um: " Qi Zheng 2025-12-17 9:45 ` [PATCH v3 7/7] mm: make PT_RECLAIM depends on MMU_GATHER_RCU_TABLE_FREE Qi Zheng 6 siblings, 0 replies; 23+ messages in thread From: Qi Zheng @ 2025-12-17 9:45 UTC (permalink / raw) To: will, aneesh.kumar, npiggin, peterz, dev.jain, akpm, david, ioworker0, linmag7 Cc: linux-arch, linux-kernel, linux-mm, linux-alpha, loongarch, linux-mips, linux-parisc, linux-um, Qi Zheng, James E.J. Bottomley, Helge Deller From: Qi Zheng <zhengqi.arch@bytedance.com> On a 64-bit system, madvise(MADV_DONTNEED) may cause a large number of empty PTE page table pages (such as 100GB+). To resolve this problem, first enable MMU_GATHER_RCU_TABLE_FREE to prepare for enabling the PT_RECLAIM feature, which resolves this problem. Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Helge Deller <deller@gmx.de> --- arch/parisc/Kconfig | 1 + arch/parisc/include/asm/tlb.h | 4 ++-- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/arch/parisc/Kconfig b/arch/parisc/Kconfig index 47fd9662d8005..62d5a89d5c7bc 100644 --- a/arch/parisc/Kconfig +++ b/arch/parisc/Kconfig @@ -79,6 +79,7 @@ config PARISC select GENERIC_CLOCKEVENTS select CPU_NO_EFFICIENT_FFS select THREAD_INFO_IN_TASK + select MMU_GATHER_RCU_TABLE_FREE select NEED_DMA_MAP_STATE select NEED_SG_DMA_LENGTH select HAVE_ARCH_KGDB diff --git a/arch/parisc/include/asm/tlb.h b/arch/parisc/include/asm/tlb.h index 44235f367674d..4501fee0a8fa4 100644 --- a/arch/parisc/include/asm/tlb.h +++ b/arch/parisc/include/asm/tlb.h @@ -5,8 +5,8 @@ #include <asm-generic/tlb.h> #if CONFIG_PGTABLE_LEVELS == 3 -#define __pmd_free_tlb(tlb, pmd, addr) pmd_free((tlb)->mm, pmd) +#define __pmd_free_tlb(tlb, pmd, addr) tlb_remove_ptdesc((tlb), virt_to_ptdesc(pmd)) #endif -#define __pte_free_tlb(tlb, pte, addr) pte_free((tlb)->mm, pte) +#define __pte_free_tlb(tlb, pte, addr) tlb_remove_ptdesc((tlb), page_ptdesc(pte)) #endif -- 2.20.1 ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH v3 6/7] um: mm: enable MMU_GATHER_RCU_TABLE_FREE 2025-12-17 9:45 [PATCH v3 0/7] enable PT_RECLAIM on all 64-bit architectures Qi Zheng ` (4 preceding siblings ...) 2025-12-17 9:45 ` [PATCH v3 5/7] parisc: " Qi Zheng @ 2025-12-17 9:45 ` Qi Zheng 2025-12-17 9:45 ` [PATCH v3 7/7] mm: make PT_RECLAIM depends on MMU_GATHER_RCU_TABLE_FREE Qi Zheng 6 siblings, 0 replies; 23+ messages in thread From: Qi Zheng @ 2025-12-17 9:45 UTC (permalink / raw) To: will, aneesh.kumar, npiggin, peterz, dev.jain, akpm, david, ioworker0, linmag7 Cc: linux-arch, linux-kernel, linux-mm, linux-alpha, loongarch, linux-mips, linux-parisc, linux-um, Qi Zheng, Richard Weinberger, Anton Ivanov, Johannes Berg From: Qi Zheng <zhengqi.arch@bytedance.com> On a 64-bit system, madvise(MADV_DONTNEED) may cause a large number of empty PTE page table pages (such as 100GB+). To resolve this problem, first enable MMU_GATHER_RCU_TABLE_FREE to prepare for enabling the PT_RECLAIM feature, which resolves this problem. Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Richard Weinberger <richard@nod.at> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: Johannes Berg <johannes@sipsolutions.net> --- arch/um/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/um/Kconfig b/arch/um/Kconfig index 8415d39b0d430..098cda44db225 100644 --- a/arch/um/Kconfig +++ b/arch/um/Kconfig @@ -42,6 +42,7 @@ config UML select HAVE_SYSCALL_TRACEPOINTS select THREAD_INFO_IN_TASK select SPARSE_IRQ + select MMU_GATHER_RCU_TABLE_FREE config MMU bool -- 2.20.1 ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH v3 7/7] mm: make PT_RECLAIM depends on MMU_GATHER_RCU_TABLE_FREE 2025-12-17 9:45 [PATCH v3 0/7] enable PT_RECLAIM on all 64-bit architectures Qi Zheng ` (5 preceding siblings ...) 2025-12-17 9:45 ` [PATCH v3 6/7] um: " Qi Zheng @ 2025-12-17 9:45 ` Qi Zheng 2025-12-31 9:42 ` Wei Yang ` (3 more replies) 6 siblings, 4 replies; 23+ messages in thread From: Qi Zheng @ 2025-12-17 9:45 UTC (permalink / raw) To: will, aneesh.kumar, npiggin, peterz, dev.jain, akpm, david, ioworker0, linmag7 Cc: linux-arch, linux-kernel, linux-mm, linux-alpha, loongarch, linux-mips, linux-parisc, linux-um, Qi Zheng From: Qi Zheng <zhengqi.arch@bytedance.com> The PT_RECLAIM can work on all architectures that support MMU_GATHER_RCU_TABLE_FREE, so make PT_RECLAIM depends on MMU_GATHER_RCU_TABLE_FREE. BTW, change PT_RECLAIM to be enabled by default, since nobody should want to turn it off. Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> --- arch/x86/Kconfig | 1 - mm/Kconfig | 9 ++------- 2 files changed, 2 insertions(+), 8 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 80527299f859a..0d22da56a71b0 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -331,7 +331,6 @@ config X86 select FUNCTION_ALIGNMENT_4B imply IMA_SECURE_AND_OR_TRUSTED_BOOT if EFI select HAVE_DYNAMIC_FTRACE_NO_PATCHABLE - select ARCH_SUPPORTS_PT_RECLAIM if X86_64 select ARCH_SUPPORTS_SCHED_SMT if SMP select SCHED_SMT if SMP select ARCH_SUPPORTS_SCHED_CLUSTER if SMP diff --git a/mm/Kconfig b/mm/Kconfig index bd0ea5454af82..fc00b429b7129 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -1447,14 +1447,9 @@ config ARCH_HAS_USER_SHADOW_STACK The architecture has hardware support for userspace shadow call stacks (eg, x86 CET, arm64 GCS or RISC-V Zicfiss). -config ARCH_SUPPORTS_PT_RECLAIM - def_bool n - config PT_RECLAIM - bool "reclaim empty user page table pages" - default y - depends on ARCH_SUPPORTS_PT_RECLAIM && MMU && SMP - select MMU_GATHER_RCU_TABLE_FREE + def_bool y + depends on MMU_GATHER_RCU_TABLE_FREE help Try to reclaim empty user page table pages in paths other than munmap and exit_mmap path. -- 2.20.1 ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v3 7/7] mm: make PT_RECLAIM depends on MMU_GATHER_RCU_TABLE_FREE 2025-12-17 9:45 ` [PATCH v3 7/7] mm: make PT_RECLAIM depends on MMU_GATHER_RCU_TABLE_FREE Qi Zheng @ 2025-12-31 9:42 ` Wei Yang 2025-12-31 9:52 ` Qi Zheng 2026-01-18 11:23 ` David Hildenbrand (Red Hat) ` (2 subsequent siblings) 3 siblings, 1 reply; 23+ messages in thread From: Wei Yang @ 2025-12-31 9:42 UTC (permalink / raw) To: Qi Zheng Cc: will, aneesh.kumar, npiggin, peterz, dev.jain, akpm, david, ioworker0, linmag7, linux-arch, linux-kernel, linux-mm, linux-alpha, loongarch, linux-mips, linux-parisc, linux-um, Qi Zheng On Wed, Dec 17, 2025 at 05:45:48PM +0800, Qi Zheng wrote: >From: Qi Zheng <zhengqi.arch@bytedance.com> > >The PT_RECLAIM can work on all architectures that support >MMU_GATHER_RCU_TABLE_FREE, so make PT_RECLAIM depends on >MMU_GATHER_RCU_TABLE_FREE. > >BTW, change PT_RECLAIM to be enabled by default, since nobody should want >to turn it off. > >Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> >--- > arch/x86/Kconfig | 1 - > mm/Kconfig | 9 ++------- > 2 files changed, 2 insertions(+), 8 deletions(-) > >diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig >index 80527299f859a..0d22da56a71b0 100644 >--- a/arch/x86/Kconfig >+++ b/arch/x86/Kconfig >@@ -331,7 +331,6 @@ config X86 > select FUNCTION_ALIGNMENT_4B > imply IMA_SECURE_AND_OR_TRUSTED_BOOT if EFI > select HAVE_DYNAMIC_FTRACE_NO_PATCHABLE >- select ARCH_SUPPORTS_PT_RECLAIM if X86_64 > select ARCH_SUPPORTS_SCHED_SMT if SMP > select SCHED_SMT if SMP > select ARCH_SUPPORTS_SCHED_CLUSTER if SMP >diff --git a/mm/Kconfig b/mm/Kconfig >index bd0ea5454af82..fc00b429b7129 100644 >--- a/mm/Kconfig >+++ b/mm/Kconfig >@@ -1447,14 +1447,9 @@ config ARCH_HAS_USER_SHADOW_STACK > The architecture has hardware support for userspace shadow call > stacks (eg, x86 CET, arm64 GCS or RISC-V Zicfiss). > >-config ARCH_SUPPORTS_PT_RECLAIM >- def_bool n >- > config PT_RECLAIM >- bool "reclaim empty user page table pages" >- default y >- depends on ARCH_SUPPORTS_PT_RECLAIM && MMU && SMP >- select MMU_GATHER_RCU_TABLE_FREE >+ def_bool y >+ depends on MMU_GATHER_RCU_TABLE_FREE > help > Try to reclaim empty user page table pages in paths other than munmap > and exit_mmap path. Hi, Qi I am new to PT_RECLAIM, when reading related code I got one question. Before this patch, we could have this config combination: CONFIG_MMU_GATHER_RCU_TABLE_FREE & !CONFIG_PT_RECLAIM This means tlb_remove_table_free() is rcu version while tlb_remove_table_one() is semi rcu version. I am curious could we use rcu version tlb_remove_table_one() for this case? Use rcu version tlb_remove_table_one() if CONFIG_MMU_GATHER_RCU_TABLE_FREE. Is there some limitation here? Thanks in advance for your explanation. -- Wei Yang Help you, Help me ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v3 7/7] mm: make PT_RECLAIM depends on MMU_GATHER_RCU_TABLE_FREE 2025-12-31 9:42 ` Wei Yang @ 2025-12-31 9:52 ` Qi Zheng 2026-01-01 2:07 ` Wei Yang 0 siblings, 1 reply; 23+ messages in thread From: Qi Zheng @ 2025-12-31 9:52 UTC (permalink / raw) To: Wei Yang Cc: will, aneesh.kumar, npiggin, peterz, dev.jain, akpm, david, ioworker0, linmag7, linux-arch, linux-kernel, linux-mm, linux-alpha, loongarch, linux-mips, linux-parisc, linux-um, Qi Zheng On 12/31/25 5:42 PM, Wei Yang wrote: > On Wed, Dec 17, 2025 at 05:45:48PM +0800, Qi Zheng wrote: >> From: Qi Zheng <zhengqi.arch@bytedance.com> >> >> The PT_RECLAIM can work on all architectures that support >> MMU_GATHER_RCU_TABLE_FREE, so make PT_RECLAIM depends on >> MMU_GATHER_RCU_TABLE_FREE. >> >> BTW, change PT_RECLAIM to be enabled by default, since nobody should want >> to turn it off. >> >> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> >> --- >> arch/x86/Kconfig | 1 - >> mm/Kconfig | 9 ++------- >> 2 files changed, 2 insertions(+), 8 deletions(-) >> >> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig >> index 80527299f859a..0d22da56a71b0 100644 >> --- a/arch/x86/Kconfig >> +++ b/arch/x86/Kconfig >> @@ -331,7 +331,6 @@ config X86 >> select FUNCTION_ALIGNMENT_4B >> imply IMA_SECURE_AND_OR_TRUSTED_BOOT if EFI >> select HAVE_DYNAMIC_FTRACE_NO_PATCHABLE >> - select ARCH_SUPPORTS_PT_RECLAIM if X86_64 >> select ARCH_SUPPORTS_SCHED_SMT if SMP >> select SCHED_SMT if SMP >> select ARCH_SUPPORTS_SCHED_CLUSTER if SMP >> diff --git a/mm/Kconfig b/mm/Kconfig >> index bd0ea5454af82..fc00b429b7129 100644 >> --- a/mm/Kconfig >> +++ b/mm/Kconfig >> @@ -1447,14 +1447,9 @@ config ARCH_HAS_USER_SHADOW_STACK >> The architecture has hardware support for userspace shadow call >> stacks (eg, x86 CET, arm64 GCS or RISC-V Zicfiss). >> >> -config ARCH_SUPPORTS_PT_RECLAIM >> - def_bool n >> - >> config PT_RECLAIM >> - bool "reclaim empty user page table pages" >> - default y >> - depends on ARCH_SUPPORTS_PT_RECLAIM && MMU && SMP >> - select MMU_GATHER_RCU_TABLE_FREE >> + def_bool y >> + depends on MMU_GATHER_RCU_TABLE_FREE >> help >> Try to reclaim empty user page table pages in paths other than munmap >> and exit_mmap path. > > Hi, Qi > > I am new to PT_RECLAIM, when reading related code I got one question. > > Before this patch, we could have this config combination: > > CONFIG_MMU_GATHER_RCU_TABLE_FREE & !CONFIG_PT_RECLAIM > > This means tlb_remove_table_free() is rcu version while tlb_remove_table_one() > is semi rcu version. > > I am curious could we use rcu version tlb_remove_table_one() for this case? > Use rcu version tlb_remove_table_one() if CONFIG_MMU_GATHER_RCU_TABLE_FREE. Is > there some limitation here? I think there's no problem. The rcu version can also ensure that the fast GUP works well. > > Thanks in advance for your explanation. > > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v3 7/7] mm: make PT_RECLAIM depends on MMU_GATHER_RCU_TABLE_FREE 2025-12-31 9:52 ` Qi Zheng @ 2026-01-01 2:07 ` Wei Yang 2026-01-19 10:18 ` David Hildenbrand (Red Hat) 0 siblings, 1 reply; 23+ messages in thread From: Wei Yang @ 2026-01-01 2:07 UTC (permalink / raw) To: Qi Zheng Cc: Wei Yang, will, aneesh.kumar, npiggin, peterz, dev.jain, akpm, david, ioworker0, linmag7, linux-arch, linux-kernel, linux-mm, linux-alpha, loongarch, linux-mips, linux-parisc, linux-um, Qi Zheng On Wed, Dec 31, 2025 at 05:52:57PM +0800, Qi Zheng wrote: > > >On 12/31/25 5:42 PM, Wei Yang wrote: >> On Wed, Dec 17, 2025 at 05:45:48PM +0800, Qi Zheng wrote: >> > From: Qi Zheng <zhengqi.arch@bytedance.com> >> > >> > The PT_RECLAIM can work on all architectures that support >> > MMU_GATHER_RCU_TABLE_FREE, so make PT_RECLAIM depends on >> > MMU_GATHER_RCU_TABLE_FREE. >> > >> > BTW, change PT_RECLAIM to be enabled by default, since nobody should want >> > to turn it off. >> > >> > Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> >> > --- >> > arch/x86/Kconfig | 1 - >> > mm/Kconfig | 9 ++------- >> > 2 files changed, 2 insertions(+), 8 deletions(-) >> > >> > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig >> > index 80527299f859a..0d22da56a71b0 100644 >> > --- a/arch/x86/Kconfig >> > +++ b/arch/x86/Kconfig >> > @@ -331,7 +331,6 @@ config X86 >> > select FUNCTION_ALIGNMENT_4B >> > imply IMA_SECURE_AND_OR_TRUSTED_BOOT if EFI >> > select HAVE_DYNAMIC_FTRACE_NO_PATCHABLE >> > - select ARCH_SUPPORTS_PT_RECLAIM if X86_64 >> > select ARCH_SUPPORTS_SCHED_SMT if SMP >> > select SCHED_SMT if SMP >> > select ARCH_SUPPORTS_SCHED_CLUSTER if SMP >> > diff --git a/mm/Kconfig b/mm/Kconfig >> > index bd0ea5454af82..fc00b429b7129 100644 >> > --- a/mm/Kconfig >> > +++ b/mm/Kconfig >> > @@ -1447,14 +1447,9 @@ config ARCH_HAS_USER_SHADOW_STACK >> > The architecture has hardware support for userspace shadow call >> > stacks (eg, x86 CET, arm64 GCS or RISC-V Zicfiss). >> > >> > -config ARCH_SUPPORTS_PT_RECLAIM >> > - def_bool n >> > - >> > config PT_RECLAIM >> > - bool "reclaim empty user page table pages" >> > - default y >> > - depends on ARCH_SUPPORTS_PT_RECLAIM && MMU && SMP >> > - select MMU_GATHER_RCU_TABLE_FREE >> > + def_bool y >> > + depends on MMU_GATHER_RCU_TABLE_FREE >> > help >> > Try to reclaim empty user page table pages in paths other than munmap >> > and exit_mmap path. >> >> Hi, Qi >> >> I am new to PT_RECLAIM, when reading related code I got one question. >> >> Before this patch, we could have this config combination: >> >> CONFIG_MMU_GATHER_RCU_TABLE_FREE & !CONFIG_PT_RECLAIM >> >> This means tlb_remove_table_free() is rcu version while tlb_remove_table_one() >> is semi rcu version. >> >> I am curious could we use rcu version tlb_remove_table_one() for this case? >> Use rcu version tlb_remove_table_one() if CONFIG_MMU_GATHER_RCU_TABLE_FREE. Is >> there some limitation here? > >I think there's no problem. The rcu version can also ensure that the >fast GUP works well. > Thanks for your quick response :-) And Happy New Year So my little suggestion is move the definition of __tlb_remove_table_one() under CONFIG_MMU_GATHER_RCU_TABLE_FREE. Do you thinks this would be more clear? >> >> Thanks in advance for your explanation. >> >> -- Wei Yang Help you, Help me ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v3 7/7] mm: make PT_RECLAIM depends on MMU_GATHER_RCU_TABLE_FREE 2026-01-01 2:07 ` Wei Yang @ 2026-01-19 10:18 ` David Hildenbrand (Red Hat) 2026-01-22 14:00 ` Wei Yang 0 siblings, 1 reply; 23+ messages in thread From: David Hildenbrand (Red Hat) @ 2026-01-19 10:18 UTC (permalink / raw) To: Wei Yang, Qi Zheng Cc: will, aneesh.kumar, npiggin, peterz, dev.jain, akpm, ioworker0, linmag7, linux-arch, linux-kernel, linux-mm, linux-alpha, loongarch, linux-mips, linux-parisc, linux-um, Qi Zheng On 1/1/26 03:07, Wei Yang wrote: > On Wed, Dec 31, 2025 at 05:52:57PM +0800, Qi Zheng wrote: >> >> >> On 12/31/25 5:42 PM, Wei Yang wrote: >>> On Wed, Dec 17, 2025 at 05:45:48PM +0800, Qi Zheng wrote: >>>> From: Qi Zheng <zhengqi.arch@bytedance.com> >>>> >>>> The PT_RECLAIM can work on all architectures that support >>>> MMU_GATHER_RCU_TABLE_FREE, so make PT_RECLAIM depends on >>>> MMU_GATHER_RCU_TABLE_FREE. >>>> >>>> BTW, change PT_RECLAIM to be enabled by default, since nobody should want >>>> to turn it off. >>>> >>>> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> >>>> --- >>>> arch/x86/Kconfig | 1 - >>>> mm/Kconfig | 9 ++------- >>>> 2 files changed, 2 insertions(+), 8 deletions(-) >>>> >>>> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig >>>> index 80527299f859a..0d22da56a71b0 100644 >>>> --- a/arch/x86/Kconfig >>>> +++ b/arch/x86/Kconfig >>>> @@ -331,7 +331,6 @@ config X86 >>>> select FUNCTION_ALIGNMENT_4B >>>> imply IMA_SECURE_AND_OR_TRUSTED_BOOT if EFI >>>> select HAVE_DYNAMIC_FTRACE_NO_PATCHABLE >>>> - select ARCH_SUPPORTS_PT_RECLAIM if X86_64 >>>> select ARCH_SUPPORTS_SCHED_SMT if SMP >>>> select SCHED_SMT if SMP >>>> select ARCH_SUPPORTS_SCHED_CLUSTER if SMP >>>> diff --git a/mm/Kconfig b/mm/Kconfig >>>> index bd0ea5454af82..fc00b429b7129 100644 >>>> --- a/mm/Kconfig >>>> +++ b/mm/Kconfig >>>> @@ -1447,14 +1447,9 @@ config ARCH_HAS_USER_SHADOW_STACK >>>> The architecture has hardware support for userspace shadow call >>>> stacks (eg, x86 CET, arm64 GCS or RISC-V Zicfiss). >>>> >>>> -config ARCH_SUPPORTS_PT_RECLAIM >>>> - def_bool n >>>> - >>>> config PT_RECLAIM >>>> - bool "reclaim empty user page table pages" >>>> - default y >>>> - depends on ARCH_SUPPORTS_PT_RECLAIM && MMU && SMP >>>> - select MMU_GATHER_RCU_TABLE_FREE >>>> + def_bool y >>>> + depends on MMU_GATHER_RCU_TABLE_FREE >>>> help >>>> Try to reclaim empty user page table pages in paths other than munmap >>>> and exit_mmap path. >>> >>> Hi, Qi >>> >>> I am new to PT_RECLAIM, when reading related code I got one question. >>> >>> Before this patch, we could have this config combination: >>> >>> CONFIG_MMU_GATHER_RCU_TABLE_FREE & !CONFIG_PT_RECLAIM >>> >>> This means tlb_remove_table_free() is rcu version while tlb_remove_table_one() >>> is semi rcu version. >>> >>> I am curious could we use rcu version tlb_remove_table_one() for this case? >>> Use rcu version tlb_remove_table_one() if CONFIG_MMU_GATHER_RCU_TABLE_FREE. Is >>> there some limitation here? >> >> I think there's no problem. The rcu version can also ensure that the >> fast GUP works well. >> > > Thanks for your quick response :-) > > And Happy New Year > > So my little suggestion is move the definition of __tlb_remove_table_one() > under CONFIG_MMU_GATHER_RCU_TABLE_FREE. Do you thinks this would be more > clear? Do you mean diff --git a/mm/mmu_gather.c b/mm/mmu_gather.c index 2faa23d7f8d42..6aeba4bae68d2 100644 --- a/mm/mmu_gather.c +++ b/mm/mmu_gather.c @@ -319,7 +319,7 @@ static inline void tlb_table_invalidate(struct mmu_gather *tlb) } } -#ifdef CONFIG_PT_RECLAIM +#ifdef CONFIG_MMU_GATHER_RCU_TABLE_FREE static inline void __tlb_remove_table_one_rcu(struct rcu_head *head) { struct ptdesc *ptdesc; ? -- Cheers David ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v3 7/7] mm: make PT_RECLAIM depends on MMU_GATHER_RCU_TABLE_FREE 2026-01-19 10:18 ` David Hildenbrand (Red Hat) @ 2026-01-22 14:00 ` Wei Yang 2026-01-23 3:21 ` Qi Zheng 0 siblings, 1 reply; 23+ messages in thread From: Wei Yang @ 2026-01-22 14:00 UTC (permalink / raw) To: David Hildenbrand (Red Hat) Cc: Wei Yang, Qi Zheng, will, aneesh.kumar, npiggin, peterz, dev.jain, akpm, ioworker0, linmag7, linux-arch, linux-kernel, linux-mm, linux-alpha, loongarch, linux-mips, linux-parisc, linux-um, Qi Zheng On Mon, Jan 19, 2026 at 11:18:52AM +0100, David Hildenbrand (Red Hat) wrote: >On 1/1/26 03:07, Wei Yang wrote: >> On Wed, Dec 31, 2025 at 05:52:57PM +0800, Qi Zheng wrote: >> > >> > >> > On 12/31/25 5:42 PM, Wei Yang wrote: >> > > On Wed, Dec 17, 2025 at 05:45:48PM +0800, Qi Zheng wrote: >> > > > From: Qi Zheng <zhengqi.arch@bytedance.com> >> > > > >> > > > The PT_RECLAIM can work on all architectures that support >> > > > MMU_GATHER_RCU_TABLE_FREE, so make PT_RECLAIM depends on >> > > > MMU_GATHER_RCU_TABLE_FREE. >> > > > >> > > > BTW, change PT_RECLAIM to be enabled by default, since nobody should want >> > > > to turn it off. >> > > > >> > > > Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> >> > > > --- >> > > > arch/x86/Kconfig | 1 - >> > > > mm/Kconfig | 9 ++------- >> > > > 2 files changed, 2 insertions(+), 8 deletions(-) >> > > > >> > > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig >> > > > index 80527299f859a..0d22da56a71b0 100644 >> > > > --- a/arch/x86/Kconfig >> > > > +++ b/arch/x86/Kconfig >> > > > @@ -331,7 +331,6 @@ config X86 >> > > > select FUNCTION_ALIGNMENT_4B >> > > > imply IMA_SECURE_AND_OR_TRUSTED_BOOT if EFI >> > > > select HAVE_DYNAMIC_FTRACE_NO_PATCHABLE >> > > > - select ARCH_SUPPORTS_PT_RECLAIM if X86_64 >> > > > select ARCH_SUPPORTS_SCHED_SMT if SMP >> > > > select SCHED_SMT if SMP >> > > > select ARCH_SUPPORTS_SCHED_CLUSTER if SMP >> > > > diff --git a/mm/Kconfig b/mm/Kconfig >> > > > index bd0ea5454af82..fc00b429b7129 100644 >> > > > --- a/mm/Kconfig >> > > > +++ b/mm/Kconfig >> > > > @@ -1447,14 +1447,9 @@ config ARCH_HAS_USER_SHADOW_STACK >> > > > The architecture has hardware support for userspace shadow call >> > > > stacks (eg, x86 CET, arm64 GCS or RISC-V Zicfiss). >> > > > >> > > > -config ARCH_SUPPORTS_PT_RECLAIM >> > > > - def_bool n >> > > > - >> > > > config PT_RECLAIM >> > > > - bool "reclaim empty user page table pages" >> > > > - default y >> > > > - depends on ARCH_SUPPORTS_PT_RECLAIM && MMU && SMP >> > > > - select MMU_GATHER_RCU_TABLE_FREE >> > > > + def_bool y >> > > > + depends on MMU_GATHER_RCU_TABLE_FREE >> > > > help >> > > > Try to reclaim empty user page table pages in paths other than munmap >> > > > and exit_mmap path. >> > > >> > > Hi, Qi >> > > >> > > I am new to PT_RECLAIM, when reading related code I got one question. >> > > >> > > Before this patch, we could have this config combination: >> > > >> > > CONFIG_MMU_GATHER_RCU_TABLE_FREE & !CONFIG_PT_RECLAIM >> > > >> > > This means tlb_remove_table_free() is rcu version while tlb_remove_table_one() >> > > is semi rcu version. >> > > >> > > I am curious could we use rcu version tlb_remove_table_one() for this case? >> > > Use rcu version tlb_remove_table_one() if CONFIG_MMU_GATHER_RCU_TABLE_FREE. Is >> > > there some limitation here? >> > >> > I think there's no problem. The rcu version can also ensure that the >> > fast GUP works well. >> > >> >> Thanks for your quick response :-) >> >> And Happy New Year >> >> So my little suggestion is move the definition of __tlb_remove_table_one() >> under CONFIG_MMU_GATHER_RCU_TABLE_FREE. Do you thinks this would be more >> clear? > > >Do you mean > >diff --git a/mm/mmu_gather.c b/mm/mmu_gather.c >index 2faa23d7f8d42..6aeba4bae68d2 100644 >--- a/mm/mmu_gather.c >+++ b/mm/mmu_gather.c >@@ -319,7 +319,7 @@ static inline void tlb_table_invalidate(struct mmu_gather >*tlb) > } > } > >-#ifdef CONFIG_PT_RECLAIM >+#ifdef CONFIG_MMU_GATHER_RCU_TABLE_FREE > static inline void __tlb_remove_table_one_rcu(struct rcu_head *head) > { > struct ptdesc *ptdesc; > >? Sorry for the late reply. Yes, and maybe we can move the definition to the #ifdef CONFIG_MMU_GATHER_RCU_TABLE_FREE code block above, then to be next to tlb_remove_table_free(). So that we always have rcu version when CONFIG_MMU_GATHER_RCU_TABLE_FREE. > >-- >Cheers > >David -- Wei Yang Help you, Help me ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v3 7/7] mm: make PT_RECLAIM depends on MMU_GATHER_RCU_TABLE_FREE 2026-01-22 14:00 ` Wei Yang @ 2026-01-23 3:21 ` Qi Zheng 2026-01-24 1:45 ` Wei Yang 0 siblings, 1 reply; 23+ messages in thread From: Qi Zheng @ 2026-01-23 3:21 UTC (permalink / raw) To: Wei Yang, David Hildenbrand (Red Hat) Cc: will, aneesh.kumar, npiggin, peterz, dev.jain, akpm, ioworker0, linmag7, linux-arch, linux-kernel, linux-mm, linux-alpha, loongarch, linux-mips, linux-parisc, linux-um, Qi Zheng On 1/22/26 10:00 PM, Wei Yang wrote: > On Mon, Jan 19, 2026 at 11:18:52AM +0100, David Hildenbrand (Red Hat) wrote: >> On 1/1/26 03:07, Wei Yang wrote: >>> On Wed, Dec 31, 2025 at 05:52:57PM +0800, Qi Zheng wrote: >>>> >>>> >>>> On 12/31/25 5:42 PM, Wei Yang wrote: >>>>> On Wed, Dec 17, 2025 at 05:45:48PM +0800, Qi Zheng wrote: >>>>>> From: Qi Zheng <zhengqi.arch@bytedance.com> >>>>>> >>>>>> The PT_RECLAIM can work on all architectures that support >>>>>> MMU_GATHER_RCU_TABLE_FREE, so make PT_RECLAIM depends on >>>>>> MMU_GATHER_RCU_TABLE_FREE. >>>>>> >>>>>> BTW, change PT_RECLAIM to be enabled by default, since nobody should want >>>>>> to turn it off. >>>>>> >>>>>> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> >>>>>> --- >>>>>> arch/x86/Kconfig | 1 - >>>>>> mm/Kconfig | 9 ++------- >>>>>> 2 files changed, 2 insertions(+), 8 deletions(-) >>>>>> >>>>>> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig >>>>>> index 80527299f859a..0d22da56a71b0 100644 >>>>>> --- a/arch/x86/Kconfig >>>>>> +++ b/arch/x86/Kconfig >>>>>> @@ -331,7 +331,6 @@ config X86 >>>>>> select FUNCTION_ALIGNMENT_4B >>>>>> imply IMA_SECURE_AND_OR_TRUSTED_BOOT if EFI >>>>>> select HAVE_DYNAMIC_FTRACE_NO_PATCHABLE >>>>>> - select ARCH_SUPPORTS_PT_RECLAIM if X86_64 >>>>>> select ARCH_SUPPORTS_SCHED_SMT if SMP >>>>>> select SCHED_SMT if SMP >>>>>> select ARCH_SUPPORTS_SCHED_CLUSTER if SMP >>>>>> diff --git a/mm/Kconfig b/mm/Kconfig >>>>>> index bd0ea5454af82..fc00b429b7129 100644 >>>>>> --- a/mm/Kconfig >>>>>> +++ b/mm/Kconfig >>>>>> @@ -1447,14 +1447,9 @@ config ARCH_HAS_USER_SHADOW_STACK >>>>>> The architecture has hardware support for userspace shadow call >>>>>> stacks (eg, x86 CET, arm64 GCS or RISC-V Zicfiss). >>>>>> >>>>>> -config ARCH_SUPPORTS_PT_RECLAIM >>>>>> - def_bool n >>>>>> - >>>>>> config PT_RECLAIM >>>>>> - bool "reclaim empty user page table pages" >>>>>> - default y >>>>>> - depends on ARCH_SUPPORTS_PT_RECLAIM && MMU && SMP >>>>>> - select MMU_GATHER_RCU_TABLE_FREE >>>>>> + def_bool y >>>>>> + depends on MMU_GATHER_RCU_TABLE_FREE >>>>>> help >>>>>> Try to reclaim empty user page table pages in paths other than munmap >>>>>> and exit_mmap path. >>>>> >>>>> Hi, Qi >>>>> >>>>> I am new to PT_RECLAIM, when reading related code I got one question. >>>>> >>>>> Before this patch, we could have this config combination: >>>>> >>>>> CONFIG_MMU_GATHER_RCU_TABLE_FREE & !CONFIG_PT_RECLAIM >>>>> >>>>> This means tlb_remove_table_free() is rcu version while tlb_remove_table_one() >>>>> is semi rcu version. >>>>> >>>>> I am curious could we use rcu version tlb_remove_table_one() for this case? >>>>> Use rcu version tlb_remove_table_one() if CONFIG_MMU_GATHER_RCU_TABLE_FREE. Is >>>>> there some limitation here? >>>> >>>> I think there's no problem. The rcu version can also ensure that the >>>> fast GUP works well. >>>> >>> >>> Thanks for your quick response :-) >>> >>> And Happy New Year >>> >>> So my little suggestion is move the definition of __tlb_remove_table_one() >>> under CONFIG_MMU_GATHER_RCU_TABLE_FREE. Do you thinks this would be more >>> clear? >> >> >> Do you mean >> >> diff --git a/mm/mmu_gather.c b/mm/mmu_gather.c >> index 2faa23d7f8d42..6aeba4bae68d2 100644 >> --- a/mm/mmu_gather.c >> +++ b/mm/mmu_gather.c >> @@ -319,7 +319,7 @@ static inline void tlb_table_invalidate(struct mmu_gather >> *tlb) >> } >> } >> >> -#ifdef CONFIG_PT_RECLAIM >> +#ifdef CONFIG_MMU_GATHER_RCU_TABLE_FREE >> static inline void __tlb_remove_table_one_rcu(struct rcu_head *head) >> { >> struct ptdesc *ptdesc; >> >> ? > > Sorry for the late reply. > > Yes, and maybe we can move the definition to the > #ifdef CONFIG_MMU_GATHER_RCU_TABLE_FREE code block above, then to be next to > tlb_remove_table_free(). > > So that we always have rcu version when CONFIG_MMU_GATHER_RCU_TABLE_FREE. LGTM, could you help submit an official patch? Thanks, Qi > >> >> -- >> Cheers >> >> David > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v3 7/7] mm: make PT_RECLAIM depends on MMU_GATHER_RCU_TABLE_FREE 2026-01-23 3:21 ` Qi Zheng @ 2026-01-24 1:45 ` Wei Yang 0 siblings, 0 replies; 23+ messages in thread From: Wei Yang @ 2026-01-24 1:45 UTC (permalink / raw) To: Qi Zheng Cc: Wei Yang, David Hildenbrand (Red Hat), will, aneesh.kumar, npiggin, peterz, dev.jain, akpm, ioworker0, linmag7, linux-arch, linux-kernel, linux-mm, linux-alpha, loongarch, linux-mips, linux-parisc, linux-um, Qi Zheng On Fri, Jan 23, 2026 at 11:21:50AM +0800, Qi Zheng wrote: > > >On 1/22/26 10:00 PM, Wei Yang wrote: >> On Mon, Jan 19, 2026 at 11:18:52AM +0100, David Hildenbrand (Red Hat) wrote: >> > On 1/1/26 03:07, Wei Yang wrote: >> > > On Wed, Dec 31, 2025 at 05:52:57PM +0800, Qi Zheng wrote: >> > > > >> > > > >> > > > On 12/31/25 5:42 PM, Wei Yang wrote: >> > > > > On Wed, Dec 17, 2025 at 05:45:48PM +0800, Qi Zheng wrote: >> > > > > > From: Qi Zheng <zhengqi.arch@bytedance.com> >> > > > > > >> > > > > > The PT_RECLAIM can work on all architectures that support >> > > > > > MMU_GATHER_RCU_TABLE_FREE, so make PT_RECLAIM depends on >> > > > > > MMU_GATHER_RCU_TABLE_FREE. >> > > > > > >> > > > > > BTW, change PT_RECLAIM to be enabled by default, since nobody should want >> > > > > > to turn it off. >> > > > > > >> > > > > > Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> >> > > > > > --- >> > > > > > arch/x86/Kconfig | 1 - >> > > > > > mm/Kconfig | 9 ++------- >> > > > > > 2 files changed, 2 insertions(+), 8 deletions(-) >> > > > > > >> > > > > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig >> > > > > > index 80527299f859a..0d22da56a71b0 100644 >> > > > > > --- a/arch/x86/Kconfig >> > > > > > +++ b/arch/x86/Kconfig >> > > > > > @@ -331,7 +331,6 @@ config X86 >> > > > > > select FUNCTION_ALIGNMENT_4B >> > > > > > imply IMA_SECURE_AND_OR_TRUSTED_BOOT if EFI >> > > > > > select HAVE_DYNAMIC_FTRACE_NO_PATCHABLE >> > > > > > - select ARCH_SUPPORTS_PT_RECLAIM if X86_64 >> > > > > > select ARCH_SUPPORTS_SCHED_SMT if SMP >> > > > > > select SCHED_SMT if SMP >> > > > > > select ARCH_SUPPORTS_SCHED_CLUSTER if SMP >> > > > > > diff --git a/mm/Kconfig b/mm/Kconfig >> > > > > > index bd0ea5454af82..fc00b429b7129 100644 >> > > > > > --- a/mm/Kconfig >> > > > > > +++ b/mm/Kconfig >> > > > > > @@ -1447,14 +1447,9 @@ config ARCH_HAS_USER_SHADOW_STACK >> > > > > > The architecture has hardware support for userspace shadow call >> > > > > > stacks (eg, x86 CET, arm64 GCS or RISC-V Zicfiss). >> > > > > > >> > > > > > -config ARCH_SUPPORTS_PT_RECLAIM >> > > > > > - def_bool n >> > > > > > - >> > > > > > config PT_RECLAIM >> > > > > > - bool "reclaim empty user page table pages" >> > > > > > - default y >> > > > > > - depends on ARCH_SUPPORTS_PT_RECLAIM && MMU && SMP >> > > > > > - select MMU_GATHER_RCU_TABLE_FREE >> > > > > > + def_bool y >> > > > > > + depends on MMU_GATHER_RCU_TABLE_FREE >> > > > > > help >> > > > > > Try to reclaim empty user page table pages in paths other than munmap >> > > > > > and exit_mmap path. >> > > > > >> > > > > Hi, Qi >> > > > > >> > > > > I am new to PT_RECLAIM, when reading related code I got one question. >> > > > > >> > > > > Before this patch, we could have this config combination: >> > > > > >> > > > > CONFIG_MMU_GATHER_RCU_TABLE_FREE & !CONFIG_PT_RECLAIM >> > > > > >> > > > > This means tlb_remove_table_free() is rcu version while tlb_remove_table_one() >> > > > > is semi rcu version. >> > > > > >> > > > > I am curious could we use rcu version tlb_remove_table_one() for this case? >> > > > > Use rcu version tlb_remove_table_one() if CONFIG_MMU_GATHER_RCU_TABLE_FREE. Is >> > > > > there some limitation here? >> > > > >> > > > I think there's no problem. The rcu version can also ensure that the >> > > > fast GUP works well. >> > > > >> > > >> > > Thanks for your quick response :-) >> > > >> > > And Happy New Year >> > > >> > > So my little suggestion is move the definition of __tlb_remove_table_one() >> > > under CONFIG_MMU_GATHER_RCU_TABLE_FREE. Do you thinks this would be more >> > > clear? >> > >> > >> > Do you mean >> > >> > diff --git a/mm/mmu_gather.c b/mm/mmu_gather.c >> > index 2faa23d7f8d42..6aeba4bae68d2 100644 >> > --- a/mm/mmu_gather.c >> > +++ b/mm/mmu_gather.c >> > @@ -319,7 +319,7 @@ static inline void tlb_table_invalidate(struct mmu_gather >> > *tlb) >> > } >> > } >> > >> > -#ifdef CONFIG_PT_RECLAIM >> > +#ifdef CONFIG_MMU_GATHER_RCU_TABLE_FREE >> > static inline void __tlb_remove_table_one_rcu(struct rcu_head *head) >> > { >> > struct ptdesc *ptdesc; >> > >> > ? >> >> Sorry for the late reply. >> >> Yes, and maybe we can move the definition to the >> #ifdef CONFIG_MMU_GATHER_RCU_TABLE_FREE code block above, then to be next to >> tlb_remove_table_free(). >> >> So that we always have rcu version when CONFIG_MMU_GATHER_RCU_TABLE_FREE. > >LGTM, could you help submit an official patch? > Sure. Since this is trivial cleanup, I will post it till next merge window. >Thanks, >Qi > >> >> > >> > -- >> > Cheers >> > >> > David >> -- Wei Yang Help you, Help me ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v3 7/7] mm: make PT_RECLAIM depends on MMU_GATHER_RCU_TABLE_FREE 2025-12-17 9:45 ` [PATCH v3 7/7] mm: make PT_RECLAIM depends on MMU_GATHER_RCU_TABLE_FREE Qi Zheng 2025-12-31 9:42 ` Wei Yang @ 2026-01-18 11:23 ` David Hildenbrand (Red Hat) 2026-01-19 3:50 ` Qi Zheng 2026-01-19 10:20 ` David Hildenbrand (Red Hat) 2026-01-23 15:15 ` Andreas Larsson 3 siblings, 1 reply; 23+ messages in thread From: David Hildenbrand (Red Hat) @ 2026-01-18 11:23 UTC (permalink / raw) To: Qi Zheng, will, aneesh.kumar, npiggin, peterz, dev.jain, akpm, ioworker0, linmag7 Cc: linux-arch, linux-kernel, linux-mm, linux-alpha, loongarch, linux-mips, linux-parisc, linux-um, Qi Zheng On 12/17/25 10:45, Qi Zheng wrote: > From: Qi Zheng <zhengqi.arch@bytedance.com> > > The PT_RECLAIM can work on all architectures that support > MMU_GATHER_RCU_TABLE_FREE, so make PT_RECLAIM depends on > MMU_GATHER_RCU_TABLE_FREE. > > BTW, change PT_RECLAIM to be enabled by default, since nobody should want > to turn it off. > > Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> > --- > arch/x86/Kconfig | 1 - > mm/Kconfig | 9 ++------- > 2 files changed, 2 insertions(+), 8 deletions(-) > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > index 80527299f859a..0d22da56a71b0 100644 > --- a/arch/x86/Kconfig > +++ b/arch/x86/Kconfig > @@ -331,7 +331,6 @@ config X86 > select FUNCTION_ALIGNMENT_4B > imply IMA_SECURE_AND_OR_TRUSTED_BOOT if EFI > select HAVE_DYNAMIC_FTRACE_NO_PATCHABLE > - select ARCH_SUPPORTS_PT_RECLAIM if X86_64 > select ARCH_SUPPORTS_SCHED_SMT if SMP > select SCHED_SMT if SMP > select ARCH_SUPPORTS_SCHED_CLUSTER if SMP > diff --git a/mm/Kconfig b/mm/Kconfig > index bd0ea5454af82..fc00b429b7129 100644 > --- a/mm/Kconfig > +++ b/mm/Kconfig > @@ -1447,14 +1447,9 @@ config ARCH_HAS_USER_SHADOW_STACK > The architecture has hardware support for userspace shadow call > stacks (eg, x86 CET, arm64 GCS or RISC-V Zicfiss). > > -config ARCH_SUPPORTS_PT_RECLAIM > - def_bool n > - > config PT_RECLAIM > - bool "reclaim empty user page table pages" > - default y > - depends on ARCH_SUPPORTS_PT_RECLAIM && MMU && SMP > - select MMU_GATHER_RCU_TABLE_FREE > + def_bool y > + depends on MMU_GATHER_RCU_TABLE_FREE > help > Try to reclaim empty user page table pages in paths other than munmap > and exit_mmap path. This patch seems to make s390x compilations sometimes unhappy: Unverified Warning (likely false positive, kindly check if interested): mm/memory.c:1911 zap_pte_range() error: uninitialized symbol 'pmdval'. Warning ids grouped by kconfigs: recent_errors `-- s390-randconfig-r072-20260117 `-- mm-memory.c-zap_pte_range()-error:uninitialized-symbol-pmdval-. I assume the compiler is not able to figure out that only when try_get_and_clear_pmd() returns false that pmdval could be uninitialized. Maybe it has to do with LTO? After all, that function resides in a different compilation unit. Which makes me wonder whether we want to just move try_get_and_clear_pmd() and reclaim_pt_is_enabled() to internal.h or even just memory.c? But then, maybe we could remove pt_reclaim.c completely and just have try_to_free_pte() in memory.c as well? I would just do the following cleanup: From cfe97092f71fcc88f729f07ee0bc6816e3e398f0 Mon Sep 17 00:00:00 2001 From: "David Hildenbrand (Red Hat)" <david@kernel.org> Date: Sun, 18 Jan 2026 12:20:55 +0100 Subject: [PATCH] mm: move pte table reclaim code to memory.c Let's move the code and clean it up a bit along the way. Signed-off-by: David Hildenbrand (Red Hat) <david@kernel.org> --- MAINTAINERS | 1 - mm/internal.h | 18 ------------- mm/memory.c | 70 ++++++++++++++++++++++++++++++++++++++++++----- mm/pt_reclaim.c | 72 ------------------------------------------------- 4 files changed, 64 insertions(+), 97 deletions(-) delete mode 100644 mm/pt_reclaim.c diff --git a/MAINTAINERS b/MAINTAINERS index 11720728d92f2..28e8e28bca3e5 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -16692,7 +16692,6 @@ R: Shakeel Butt <shakeel.butt@linux.dev> R: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> L: linux-mm@kvack.org S: Maintained -F: mm/pt_reclaim.c F: mm/vmscan.c F: mm/workingset.c diff --git a/mm/internal.h b/mm/internal.h index 9508dbaf47cd4..ef71a1d9991f2 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1745,24 +1745,6 @@ int walk_page_range_debug(struct mm_struct *mm, unsigned long start, unsigned long end, const struct mm_walk_ops *ops, pgd_t *pgd, void *private); -/* pt_reclaim.c */ -bool try_get_and_clear_pmd(struct mm_struct *mm, pmd_t *pmd, pmd_t *pmdval); -void free_pte(struct mm_struct *mm, unsigned long addr, struct mmu_gather *tlb, - pmd_t pmdval); -void try_to_free_pte(struct mm_struct *mm, pmd_t *pmd, unsigned long addr, - struct mmu_gather *tlb); - -#ifdef CONFIG_PT_RECLAIM -bool reclaim_pt_is_enabled(unsigned long start, unsigned long end, - struct zap_details *details); -#else -static inline bool reclaim_pt_is_enabled(unsigned long start, unsigned long end, - struct zap_details *details) -{ - return false; -} -#endif /* CONFIG_PT_RECLAIM */ - void dup_mm_exe_file(struct mm_struct *mm, struct mm_struct *oldmm); int dup_mmap(struct mm_struct *mm, struct mm_struct *oldmm); diff --git a/mm/memory.c b/mm/memory.c index f2e9e05388743..a09226761a07f 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1824,11 +1824,68 @@ static inline int do_zap_pte_range(struct mmu_gather *tlb, return nr; } +static bool pte_table_reclaim_enabled(unsigned long start, unsigned long end, + struct zap_details *details) +{ + if (!IS_ENABLED(CONFIG_PT_RECLAIM)) + return false; + return details && details->reclaim_pt && (end - start >= PMD_SIZE); +} + +static bool zap_empty_pte_table(struct mm_struct *mm, pmd_t *pmd, pmd_t *pmdval) +{ + spinlock_t *pml = pmd_lockptr(mm, pmd); + + if (!spin_trylock(pml)) + return false; + + *pmdval = pmdp_get_lockless(pmd); + pmd_clear(pmd); + spin_unlock(pml); + + return true; +} + +static bool zap_pte_table_if_empty(struct mm_struct *mm, pmd_t *pmd, + unsigned long addr, pmd_t *pmdval) +{ + spinlock_t *pml, *ptl = NULL; + pte_t *start_pte, *pte; + int i; + + pml = pmd_lock(mm, pmd); + start_pte = pte_offset_map_rw_nolock(mm, pmd, addr, pmdval, &ptl); + if (!start_pte) + goto out_ptl; + if (ptl != pml) + spin_lock_nested(ptl, SINGLE_DEPTH_NESTING); + + for (i = 0, pte = start_pte; i < PTRS_PER_PTE; i++, pte++) { + if (!pte_none(ptep_get(pte))) + goto out_ptl; + } + pte_unmap(start_pte); + + pmd_clear(pmd); + + if (ptl != pml) + spin_unlock(ptl); + spin_unlock(pml); + return true; +out_ptl: + if (start_pte) + pte_unmap_unlock(start_pte, ptl); + if (ptl != pml) + spin_unlock(pml); + return false; +} + static unsigned long zap_pte_range(struct mmu_gather *tlb, struct vm_area_struct *vma, pmd_t *pmd, unsigned long addr, unsigned long end, struct zap_details *details) { + bool can_reclaim_pt = pte_table_reclaim_enabled(addr, end, details); bool force_flush = false, force_break = false; struct mm_struct *mm = tlb->mm; int rss[NR_MM_COUNTERS]; @@ -1837,7 +1894,6 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, pte_t *pte; pmd_t pmdval; unsigned long start = addr; - bool can_reclaim_pt = reclaim_pt_is_enabled(start, end, details); bool direct_reclaim = true; int nr; @@ -1878,7 +1934,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, * from being repopulated by another thread. */ if (can_reclaim_pt && direct_reclaim && addr == end) - direct_reclaim = try_get_and_clear_pmd(mm, pmd, &pmdval); + direct_reclaim = zap_empty_pte_table(mm, pmd, &pmdval); add_mm_rss_vec(mm, rss); lazy_mmu_mode_disable(); @@ -1907,10 +1963,12 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, } if (can_reclaim_pt) { - if (direct_reclaim) - free_pte(mm, start, tlb, pmdval); - else - try_to_free_pte(mm, pmd, start, tlb); + if (!direct_reclaim) + direct_reclaim = zap_pte_table_if_empty(mm, pmd, start, &pmdval); + if (direct_reclaim) { + pte_free_tlb(tlb, pmd_pgtable(pmdval), addr); + mm_dec_nr_ptes(mm); + } } return addr; diff --git a/mm/pt_reclaim.c b/mm/pt_reclaim.c deleted file mode 100644 index 46771cfff8239..0000000000000 --- a/mm/pt_reclaim.c +++ /dev/null @@ -1,72 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0 -#include <linux/hugetlb.h> -#include <linux/pgalloc.h> - -#include <asm/tlb.h> - -#include "internal.h" - -bool reclaim_pt_is_enabled(unsigned long start, unsigned long end, - struct zap_details *details) -{ - return details && details->reclaim_pt && (end - start >= PMD_SIZE); -} - -bool try_get_and_clear_pmd(struct mm_struct *mm, pmd_t *pmd, pmd_t *pmdval) -{ - spinlock_t *pml = pmd_lockptr(mm, pmd); - - if (!spin_trylock(pml)) - return false; - - *pmdval = pmdp_get_lockless(pmd); - pmd_clear(pmd); - spin_unlock(pml); - - return true; -} - -void free_pte(struct mm_struct *mm, unsigned long addr, struct mmu_gather *tlb, - pmd_t pmdval) -{ - pte_free_tlb(tlb, pmd_pgtable(pmdval), addr); - mm_dec_nr_ptes(mm); -} - -void try_to_free_pte(struct mm_struct *mm, pmd_t *pmd, unsigned long addr, - struct mmu_gather *tlb) -{ - pmd_t pmdval; - spinlock_t *pml, *ptl = NULL; - pte_t *start_pte, *pte; - int i; - - pml = pmd_lock(mm, pmd); - start_pte = pte_offset_map_rw_nolock(mm, pmd, addr, &pmdval, &ptl); - if (!start_pte) - goto out_ptl; - if (ptl != pml) - spin_lock_nested(ptl, SINGLE_DEPTH_NESTING); - - /* Check if it is empty PTE page */ - for (i = 0, pte = start_pte; i < PTRS_PER_PTE; i++, pte++) { - if (!pte_none(ptep_get(pte))) - goto out_ptl; - } - pte_unmap(start_pte); - - pmd_clear(pmd); - - if (ptl != pml) - spin_unlock(ptl); - spin_unlock(pml); - - free_pte(mm, addr, tlb, pmdval); - - return; -out_ptl: - if (start_pte) - pte_unmap_unlock(start_pte, ptl); - if (ptl != pml) - spin_unlock(pml); -} -- 2.52.0 Completely untested, of course. -- Cheers David ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v3 7/7] mm: make PT_RECLAIM depends on MMU_GATHER_RCU_TABLE_FREE 2026-01-18 11:23 ` David Hildenbrand (Red Hat) @ 2026-01-19 3:50 ` Qi Zheng 2026-01-19 10:12 ` David Hildenbrand (Red Hat) 0 siblings, 1 reply; 23+ messages in thread From: Qi Zheng @ 2026-01-19 3:50 UTC (permalink / raw) To: David Hildenbrand (Red Hat), will, aneesh.kumar, npiggin, peterz, dev.jain, akpm, ioworker0, linmag7 Cc: linux-arch, linux-kernel, linux-mm, linux-alpha, loongarch, linux-mips, linux-parisc, linux-um, Qi Zheng On 1/18/26 7:23 PM, David Hildenbrand (Red Hat) wrote: > On 12/17/25 10:45, Qi Zheng wrote: >> From: Qi Zheng <zhengqi.arch@bytedance.com> >> >> The PT_RECLAIM can work on all architectures that support >> MMU_GATHER_RCU_TABLE_FREE, so make PT_RECLAIM depends on >> MMU_GATHER_RCU_TABLE_FREE. >> >> BTW, change PT_RECLAIM to be enabled by default, since nobody should want >> to turn it off. >> >> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> >> --- >> arch/x86/Kconfig | 1 - >> mm/Kconfig | 9 ++------- >> 2 files changed, 2 insertions(+), 8 deletions(-) >> >> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig >> index 80527299f859a..0d22da56a71b0 100644 >> --- a/arch/x86/Kconfig >> +++ b/arch/x86/Kconfig >> @@ -331,7 +331,6 @@ config X86 >> select FUNCTION_ALIGNMENT_4B >> imply IMA_SECURE_AND_OR_TRUSTED_BOOT if EFI >> select HAVE_DYNAMIC_FTRACE_NO_PATCHABLE >> - select ARCH_SUPPORTS_PT_RECLAIM if X86_64 >> select ARCH_SUPPORTS_SCHED_SMT if SMP >> select SCHED_SMT if SMP >> select ARCH_SUPPORTS_SCHED_CLUSTER if SMP >> diff --git a/mm/Kconfig b/mm/Kconfig >> index bd0ea5454af82..fc00b429b7129 100644 >> --- a/mm/Kconfig >> +++ b/mm/Kconfig >> @@ -1447,14 +1447,9 @@ config ARCH_HAS_USER_SHADOW_STACK >> The architecture has hardware support for userspace shadow call >> stacks (eg, x86 CET, arm64 GCS or RISC-V Zicfiss). >> -config ARCH_SUPPORTS_PT_RECLAIM >> - def_bool n >> - >> config PT_RECLAIM >> - bool "reclaim empty user page table pages" >> - default y >> - depends on ARCH_SUPPORTS_PT_RECLAIM && MMU && SMP >> - select MMU_GATHER_RCU_TABLE_FREE >> + def_bool y >> + depends on MMU_GATHER_RCU_TABLE_FREE >> help >> Try to reclaim empty user page table pages in paths other than >> munmap >> and exit_mmap path. > > This patch seems to make s390x compilations sometimes unhappy: > > Unverified Warning (likely false positive, kindly check if interested): I believe it is a false positive. > > mm/memory.c:1911 zap_pte_range() error: uninitialized symbol 'pmdval'. > > Warning ids grouped by kconfigs: > > recent_errors > `-- s390-randconfig-r072-20260117 > `-- mm-memory.c-zap_pte_range()-error:uninitialized-symbol-pmdval-. > > I assume the compiler is not able to figure out that only when > try_get_and_clear_pmd() returns false that pmdval could be uninitialized. > > Maybe it has to do with LTO? > > > After all, that function resides in a different compilation unit. > > Which makes me wonder whether we want to just move try_get_and_clear_pmd() > and reclaim_pt_is_enabled() to internal.h or even just memory.c? > > But then, maybe we could remove pt_reclaim.c completely and just have > try_to_free_pte() in memory.c as well? > > > I would just do the following cleanup: > > From cfe97092f71fcc88f729f07ee0bc6816e3e398f0 Mon Sep 17 00:00:00 2001 > From: "David Hildenbrand (Red Hat)" <david@kernel.org> > Date: Sun, 18 Jan 2026 12:20:55 +0100 > Subject: [PATCH] mm: move pte table reclaim code to memory.c > > Let's move the code and clean it up a bit along the way. > > Signed-off-by: David Hildenbrand (Red Hat) <david@kernel.org> > --- > MAINTAINERS | 1 - > mm/internal.h | 18 ------------- > mm/memory.c | 70 ++++++++++++++++++++++++++++++++++++++++++----- > mm/pt_reclaim.c | 72 ------------------------------------------------- > 4 files changed, 64 insertions(+), 97 deletions(-) > delete mode 100644 mm/pt_reclaim.c Make sense, and LGTM. The reason it was placed in mm/pt_reclaim.c before was because there would be other paths calling these functions in the future. However, it can be separated out or put into a header file when there are actually such callers. would you be willing to send out an official patch? Thanks, Qi > > diff --git a/MAINTAINERS b/MAINTAINERS > index 11720728d92f2..28e8e28bca3e5 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -16692,7 +16692,6 @@ R: Shakeel Butt <shakeel.butt@linux.dev> > R: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> > L: linux-mm@kvack.org > S: Maintained > -F: mm/pt_reclaim.c > F: mm/vmscan.c > F: mm/workingset.c > > diff --git a/mm/internal.h b/mm/internal.h > index 9508dbaf47cd4..ef71a1d9991f2 100644 > --- a/mm/internal.h > +++ b/mm/internal.h > @@ -1745,24 +1745,6 @@ int walk_page_range_debug(struct mm_struct *mm, > unsigned long start, > unsigned long end, const struct mm_walk_ops *ops, > pgd_t *pgd, void *private); > > -/* pt_reclaim.c */ > -bool try_get_and_clear_pmd(struct mm_struct *mm, pmd_t *pmd, pmd_t > *pmdval); > -void free_pte(struct mm_struct *mm, unsigned long addr, struct > mmu_gather *tlb, > - pmd_t pmdval); > -void try_to_free_pte(struct mm_struct *mm, pmd_t *pmd, unsigned long addr, > - struct mmu_gather *tlb); > - > -#ifdef CONFIG_PT_RECLAIM > -bool reclaim_pt_is_enabled(unsigned long start, unsigned long end, > - struct zap_details *details); > -#else > -static inline bool reclaim_pt_is_enabled(unsigned long start, unsigned > long end, > - struct zap_details *details) > -{ > - return false; > -} > -#endif /* CONFIG_PT_RECLAIM */ > - > void dup_mm_exe_file(struct mm_struct *mm, struct mm_struct *oldmm); > int dup_mmap(struct mm_struct *mm, struct mm_struct *oldmm); > > diff --git a/mm/memory.c b/mm/memory.c > index f2e9e05388743..a09226761a07f 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -1824,11 +1824,68 @@ static inline int do_zap_pte_range(struct > mmu_gather *tlb, > return nr; > } > > +static bool pte_table_reclaim_enabled(unsigned long start, unsigned > long end, > + struct zap_details *details) > +{ > + if (!IS_ENABLED(CONFIG_PT_RECLAIM)) > + return false; > + return details && details->reclaim_pt && (end - start >= PMD_SIZE); > +} > + > +static bool zap_empty_pte_table(struct mm_struct *mm, pmd_t *pmd, pmd_t > *pmdval) > +{ > + spinlock_t *pml = pmd_lockptr(mm, pmd); > + > + if (!spin_trylock(pml)) > + return false; > + > + *pmdval = pmdp_get_lockless(pmd); > + pmd_clear(pmd); > + spin_unlock(pml); > + > + return true; > +} > + > +static bool zap_pte_table_if_empty(struct mm_struct *mm, pmd_t *pmd, > + unsigned long addr, pmd_t *pmdval) > +{ > + spinlock_t *pml, *ptl = NULL; > + pte_t *start_pte, *pte; > + int i; > + > + pml = pmd_lock(mm, pmd); > + start_pte = pte_offset_map_rw_nolock(mm, pmd, addr, pmdval, &ptl); > + if (!start_pte) > + goto out_ptl; > + if (ptl != pml) > + spin_lock_nested(ptl, SINGLE_DEPTH_NESTING); > + > + for (i = 0, pte = start_pte; i < PTRS_PER_PTE; i++, pte++) { > + if (!pte_none(ptep_get(pte))) > + goto out_ptl; > + } > + pte_unmap(start_pte); > + > + pmd_clear(pmd); > + > + if (ptl != pml) > + spin_unlock(ptl); > + spin_unlock(pml); > + return true; > +out_ptl: > + if (start_pte) > + pte_unmap_unlock(start_pte, ptl); > + if (ptl != pml) > + spin_unlock(pml); > + return false; > +} > + > static unsigned long zap_pte_range(struct mmu_gather *tlb, > struct vm_area_struct *vma, pmd_t *pmd, > unsigned long addr, unsigned long end, > struct zap_details *details) > { > + bool can_reclaim_pt = pte_table_reclaim_enabled(addr, end, details); > bool force_flush = false, force_break = false; > struct mm_struct *mm = tlb->mm; > int rss[NR_MM_COUNTERS]; > @@ -1837,7 +1894,6 @@ static unsigned long zap_pte_range(struct > mmu_gather *tlb, > pte_t *pte; > pmd_t pmdval; > unsigned long start = addr; > - bool can_reclaim_pt = reclaim_pt_is_enabled(start, end, details); > bool direct_reclaim = true; > int nr; > > @@ -1878,7 +1934,7 @@ static unsigned long zap_pte_range(struct > mmu_gather *tlb, > * from being repopulated by another thread. > */ > if (can_reclaim_pt && direct_reclaim && addr == end) > - direct_reclaim = try_get_and_clear_pmd(mm, pmd, &pmdval); > + direct_reclaim = zap_empty_pte_table(mm, pmd, &pmdval); > > add_mm_rss_vec(mm, rss); > lazy_mmu_mode_disable(); > @@ -1907,10 +1963,12 @@ static unsigned long zap_pte_range(struct > mmu_gather *tlb, > } > > if (can_reclaim_pt) { > - if (direct_reclaim) > - free_pte(mm, start, tlb, pmdval); > - else > - try_to_free_pte(mm, pmd, start, tlb); > + if (!direct_reclaim) > + direct_reclaim = zap_pte_table_if_empty(mm, pmd, start, > &pmdval); > + if (direct_reclaim) { > + pte_free_tlb(tlb, pmd_pgtable(pmdval), addr); > + mm_dec_nr_ptes(mm); > + } > } > > return addr; > diff --git a/mm/pt_reclaim.c b/mm/pt_reclaim.c > deleted file mode 100644 > index 46771cfff8239..0000000000000 > --- a/mm/pt_reclaim.c > +++ /dev/null > @@ -1,72 +0,0 @@ > -// SPDX-License-Identifier: GPL-2.0 > -#include <linux/hugetlb.h> > -#include <linux/pgalloc.h> > - > -#include <asm/tlb.h> > - > -#include "internal.h" > - > -bool reclaim_pt_is_enabled(unsigned long start, unsigned long end, > - struct zap_details *details) > -{ > - return details && details->reclaim_pt && (end - start >= PMD_SIZE); > -} > - > -bool try_get_and_clear_pmd(struct mm_struct *mm, pmd_t *pmd, pmd_t > *pmdval) > -{ > - spinlock_t *pml = pmd_lockptr(mm, pmd); > - > - if (!spin_trylock(pml)) > - return false; > - > - *pmdval = pmdp_get_lockless(pmd); > - pmd_clear(pmd); > - spin_unlock(pml); > - > - return true; > -} > - > -void free_pte(struct mm_struct *mm, unsigned long addr, struct > mmu_gather *tlb, > - pmd_t pmdval) > -{ > - pte_free_tlb(tlb, pmd_pgtable(pmdval), addr); > - mm_dec_nr_ptes(mm); > -} > - > -void try_to_free_pte(struct mm_struct *mm, pmd_t *pmd, unsigned long addr, > - struct mmu_gather *tlb) > -{ > - pmd_t pmdval; > - spinlock_t *pml, *ptl = NULL; > - pte_t *start_pte, *pte; > - int i; > - > - pml = pmd_lock(mm, pmd); > - start_pte = pte_offset_map_rw_nolock(mm, pmd, addr, &pmdval, &ptl); > - if (!start_pte) > - goto out_ptl; > - if (ptl != pml) > - spin_lock_nested(ptl, SINGLE_DEPTH_NESTING); > - > - /* Check if it is empty PTE page */ > - for (i = 0, pte = start_pte; i < PTRS_PER_PTE; i++, pte++) { > - if (!pte_none(ptep_get(pte))) > - goto out_ptl; > - } > - pte_unmap(start_pte); > - > - pmd_clear(pmd); > - > - if (ptl != pml) > - spin_unlock(ptl); > - spin_unlock(pml); > - > - free_pte(mm, addr, tlb, pmdval); > - > - return; > -out_ptl: > - if (start_pte) > - pte_unmap_unlock(start_pte, ptl); > - if (ptl != pml) > - spin_unlock(pml); > -} ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v3 7/7] mm: make PT_RECLAIM depends on MMU_GATHER_RCU_TABLE_FREE 2026-01-19 3:50 ` Qi Zheng @ 2026-01-19 10:12 ` David Hildenbrand (Red Hat) 0 siblings, 0 replies; 23+ messages in thread From: David Hildenbrand (Red Hat) @ 2026-01-19 10:12 UTC (permalink / raw) To: Qi Zheng, will, aneesh.kumar, npiggin, peterz, dev.jain, akpm, ioworker0, linmag7 Cc: linux-arch, linux-kernel, linux-mm, linux-alpha, loongarch, linux-mips, linux-parisc, linux-um, Qi Zheng On 1/19/26 04:50, Qi Zheng wrote: > > > On 1/18/26 7:23 PM, David Hildenbrand (Red Hat) wrote: >> On 12/17/25 10:45, Qi Zheng wrote: >>> From: Qi Zheng <zhengqi.arch@bytedance.com> >>> >>> The PT_RECLAIM can work on all architectures that support >>> MMU_GATHER_RCU_TABLE_FREE, so make PT_RECLAIM depends on >>> MMU_GATHER_RCU_TABLE_FREE. >>> >>> BTW, change PT_RECLAIM to be enabled by default, since nobody should want >>> to turn it off. >>> >>> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> >>> --- >>> arch/x86/Kconfig | 1 - >>> mm/Kconfig | 9 ++------- >>> 2 files changed, 2 insertions(+), 8 deletions(-) >>> >>> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig >>> index 80527299f859a..0d22da56a71b0 100644 >>> --- a/arch/x86/Kconfig >>> +++ b/arch/x86/Kconfig >>> @@ -331,7 +331,6 @@ config X86 >>> select FUNCTION_ALIGNMENT_4B >>> imply IMA_SECURE_AND_OR_TRUSTED_BOOT if EFI >>> select HAVE_DYNAMIC_FTRACE_NO_PATCHABLE >>> - select ARCH_SUPPORTS_PT_RECLAIM if X86_64 >>> select ARCH_SUPPORTS_SCHED_SMT if SMP >>> select SCHED_SMT if SMP >>> select ARCH_SUPPORTS_SCHED_CLUSTER if SMP >>> diff --git a/mm/Kconfig b/mm/Kconfig >>> index bd0ea5454af82..fc00b429b7129 100644 >>> --- a/mm/Kconfig >>> +++ b/mm/Kconfig >>> @@ -1447,14 +1447,9 @@ config ARCH_HAS_USER_SHADOW_STACK >>> The architecture has hardware support for userspace shadow call >>> stacks (eg, x86 CET, arm64 GCS or RISC-V Zicfiss). >>> -config ARCH_SUPPORTS_PT_RECLAIM >>> - def_bool n >>> - >>> config PT_RECLAIM >>> - bool "reclaim empty user page table pages" >>> - default y >>> - depends on ARCH_SUPPORTS_PT_RECLAIM && MMU && SMP >>> - select MMU_GATHER_RCU_TABLE_FREE >>> + def_bool y >>> + depends on MMU_GATHER_RCU_TABLE_FREE >>> help >>> Try to reclaim empty user page table pages in paths other than >>> munmap >>> and exit_mmap path. >> >> This patch seems to make s390x compilations sometimes unhappy: >> >> Unverified Warning (likely false positive, kindly check if interested): > > I believe it is a false positive. > >> >> mm/memory.c:1911 zap_pte_range() error: uninitialized symbol 'pmdval'. >> >> Warning ids grouped by kconfigs: >> >> recent_errors >> `-- s390-randconfig-r072-20260117 >> `-- mm-memory.c-zap_pte_range()-error:uninitialized-symbol-pmdval-. >> >> I assume the compiler is not able to figure out that only when >> try_get_and_clear_pmd() returns false that pmdval could be uninitialized. >> >> Maybe it has to do with LTO? >> >> >> After all, that function resides in a different compilation unit. >> >> Which makes me wonder whether we want to just move try_get_and_clear_pmd() >> and reclaim_pt_is_enabled() to internal.h or even just memory.c? >> >> But then, maybe we could remove pt_reclaim.c completely and just have >> try_to_free_pte() in memory.c as well? >> >> >> I would just do the following cleanup: >> >> From cfe97092f71fcc88f729f07ee0bc6816e3e398f0 Mon Sep 17 00:00:00 2001 >> From: "David Hildenbrand (Red Hat)" <david@kernel.org> >> Date: Sun, 18 Jan 2026 12:20:55 +0100 >> Subject: [PATCH] mm: move pte table reclaim code to memory.c >> >> Let's move the code and clean it up a bit along the way. >> >> Signed-off-by: David Hildenbrand (Red Hat) <david@kernel.org> >> --- >> MAINTAINERS | 1 - >> mm/internal.h | 18 ------------- >> mm/memory.c | 70 ++++++++++++++++++++++++++++++++++++++++++----- >> mm/pt_reclaim.c | 72 ------------------------------------------------- >> 4 files changed, 64 insertions(+), 97 deletions(-) >> delete mode 100644 mm/pt_reclaim.c > > Make sense, and LGTM. The reason it was placed in mm/pt_reclaim.c before > was because there would be other paths calling these functions in the > future. However, it can be separated out or put into a header file when > there are actually such callers. Most relevant zapping better happens in memory.c :) There is, of course, zapping due to RMAP unmap, but that mostly targets individual PTEs, and not a complete pte table. Likely, if ever required, we should expose a proper zapping interface from memory.c to other users, assuming the existing one is not suitable. > > would you be willing to send out an official patch? Yes, I can send one out, thanks. -- Cheers David ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v3 7/7] mm: make PT_RECLAIM depends on MMU_GATHER_RCU_TABLE_FREE 2025-12-17 9:45 ` [PATCH v3 7/7] mm: make PT_RECLAIM depends on MMU_GATHER_RCU_TABLE_FREE Qi Zheng 2025-12-31 9:42 ` Wei Yang 2026-01-18 11:23 ` David Hildenbrand (Red Hat) @ 2026-01-19 10:20 ` David Hildenbrand (Red Hat) 2026-01-23 15:15 ` Andreas Larsson 3 siblings, 0 replies; 23+ messages in thread From: David Hildenbrand (Red Hat) @ 2026-01-19 10:20 UTC (permalink / raw) To: Qi Zheng, will, aneesh.kumar, npiggin, peterz, dev.jain, akpm, ioworker0, linmag7 Cc: linux-arch, linux-kernel, linux-mm, linux-alpha, loongarch, linux-mips, linux-parisc, linux-um, Qi Zheng On 12/17/25 10:45, Qi Zheng wrote: > From: Qi Zheng <zhengqi.arch@bytedance.com> > > The PT_RECLAIM can work on all architectures that support > MMU_GATHER_RCU_TABLE_FREE, so make PT_RECLAIM depends on > MMU_GATHER_RCU_TABLE_FREE. > > BTW, change PT_RECLAIM to be enabled by default, since nobody should want > to turn it off. Right, and if there is ever a need to, I wonder whether that should be a boottime/runtime toggle instead. So far we haven't heard of any relevant runtime overheads that causes problems. > > Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> > --- > arch/x86/Kconfig | 1 - > mm/Kconfig | 9 ++------- > 2 files changed, 2 insertions(+), 8 deletions(-) > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > index 80527299f859a..0d22da56a71b0 100644 > --- a/arch/x86/Kconfig > +++ b/arch/x86/Kconfig > @@ -331,7 +331,6 @@ config X86 > select FUNCTION_ALIGNMENT_4B > imply IMA_SECURE_AND_OR_TRUSTED_BOOT if EFI > select HAVE_DYNAMIC_FTRACE_NO_PATCHABLE > - select ARCH_SUPPORTS_PT_RECLAIM if X86_64 > select ARCH_SUPPORTS_SCHED_SMT if SMP > select SCHED_SMT if SMP > select ARCH_SUPPORTS_SCHED_CLUSTER if SMP > diff --git a/mm/Kconfig b/mm/Kconfig > index bd0ea5454af82..fc00b429b7129 100644 > --- a/mm/Kconfig > +++ b/mm/Kconfig > @@ -1447,14 +1447,9 @@ config ARCH_HAS_USER_SHADOW_STACK > The architecture has hardware support for userspace shadow call > stacks (eg, x86 CET, arm64 GCS or RISC-V Zicfiss). > > -config ARCH_SUPPORTS_PT_RECLAIM > - def_bool n > - > config PT_RECLAIM > - bool "reclaim empty user page table pages" > - default y > - depends on ARCH_SUPPORTS_PT_RECLAIM && MMU && SMP > - select MMU_GATHER_RCU_TABLE_FREE > + def_bool y > + depends on MMU_GATHER_RCU_TABLE_FREE > help > Try to reclaim empty user page table pages in paths other than munmap > and exit_mmap path. Nothing jumped at me. Hopefully we're not missing something important :) Acked-by: David Hildenbrand (Red Hat) <david@kernel.org> -- Cheers David ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v3 7/7] mm: make PT_RECLAIM depends on MMU_GATHER_RCU_TABLE_FREE 2025-12-17 9:45 ` [PATCH v3 7/7] mm: make PT_RECLAIM depends on MMU_GATHER_RCU_TABLE_FREE Qi Zheng ` (2 preceding siblings ...) 2026-01-19 10:20 ` David Hildenbrand (Red Hat) @ 2026-01-23 15:15 ` Andreas Larsson 2026-01-26 6:59 ` Qi Zheng 3 siblings, 1 reply; 23+ messages in thread From: Andreas Larsson @ 2026-01-23 15:15 UTC (permalink / raw) To: Qi Zheng, will, aneesh.kumar, npiggin, peterz, dev.jain, akpm, david, ioworker0, linmag7 Cc: linux-arch, linux-kernel, linux-mm, linux-alpha, loongarch, linux-mips, linux-parisc, linux-um, Qi Zheng, sparclinux On 2025-12-17 10:45, Qi Zheng wrote: > From: Qi Zheng <zhengqi.arch@bytedance.com> > > The PT_RECLAIM can work on all architectures that support > MMU_GATHER_RCU_TABLE_FREE, so make PT_RECLAIM depends on > MMU_GATHER_RCU_TABLE_FREE. > > BTW, change PT_RECLAIM to be enabled by default, since nobody should want > to turn it off. > > Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> > --- > arch/x86/Kconfig | 1 - > mm/Kconfig | 9 ++------- > 2 files changed, 2 insertions(+), 8 deletions(-) > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > index 80527299f859a..0d22da56a71b0 100644 > --- a/arch/x86/Kconfig > +++ b/arch/x86/Kconfig > @@ -331,7 +331,6 @@ config X86 > select FUNCTION_ALIGNMENT_4B > imply IMA_SECURE_AND_OR_TRUSTED_BOOT if EFI > select HAVE_DYNAMIC_FTRACE_NO_PATCHABLE > - select ARCH_SUPPORTS_PT_RECLAIM if X86_64 > select ARCH_SUPPORTS_SCHED_SMT if SMP > select SCHED_SMT if SMP > select ARCH_SUPPORTS_SCHED_CLUSTER if SMP > diff --git a/mm/Kconfig b/mm/Kconfig > index bd0ea5454af82..fc00b429b7129 100644 > --- a/mm/Kconfig > +++ b/mm/Kconfig > @@ -1447,14 +1447,9 @@ config ARCH_HAS_USER_SHADOW_STACK > The architecture has hardware support for userspace shadow call > stacks (eg, x86 CET, arm64 GCS or RISC-V Zicfiss). > > -config ARCH_SUPPORTS_PT_RECLAIM > - def_bool n > - > config PT_RECLAIM > - bool "reclaim empty user page table pages" > - default y > - depends on ARCH_SUPPORTS_PT_RECLAIM && MMU && SMP > - select MMU_GATHER_RCU_TABLE_FREE > + def_bool y > + depends on MMU_GATHER_RCU_TABLE_FREE > help > Try to reclaim empty user page table pages in paths other than munmap > and exit_mmap path. Hi, This patch unfortunately results in a WARN_ON_ONCE and unaligned accesses on sparc64: $ stress-ng --mmaphuge 20 -t 60 stress-ng: info: [559] setting to a 1 min run per stressor stress-ng: info: [559] dispatching hogs: 20 mmaphuge [ 560.592569] ------------[ cut here ]------------ [ 560.592663] WARNING: kernel/rcu/tree.c:3098 at __call_rcu_common.constprop.0+0x200/0x760, CPU#4: stress-ng-mmaph/568 [ 560.592777] CPU: 4 UID: 1000 PID: 568 Comm: stress-ng-mmaph Not tainted 6.19.0-rc5-00127-g62fc9f6ccb97 #8 VOLUNTARY [ 560.592805] Call Trace: [ 560.592812] [<00000000004368b8>] dump_stack+0x8/0x60 [ 560.592844] [<0000000000482a60>] __warn+0xe0/0x140 [ 560.592878] [<0000000000482b64>] warn_slowpath_fmt+0xa4/0x120 [ 560.592901] [<0000000000526a40>] __call_rcu_common.constprop.0+0x200/0x760 [ 560.592931] [<0000000000526fd0>] call_rcu+0x10/0x20 [ 560.592954] [<0000000000730838>] tlb_remove_table+0x98/0xc0 [ 560.592986] [<000000000071bec4>] free_pgd_range+0x224/0x4c0 [ 560.593021] [<000000000071c35c>] free_pgtables+0x1fc/0x240 [ 560.593042] [<000000000074a6f0>] vms_clear_ptes+0x110/0x140 [ 560.593068] [<000000000074c3dc>] vms_complete_munmap_vmas+0x5c/0x280 [ 560.593094] [<000000000074de5c>] do_vmi_align_munmap+0x1dc/0x260 [ 560.593117] [<000000000074df80>] do_vmi_munmap+0xa0/0x140 [ 560.593142] [<000000000074fb2c>] __vm_munmap+0x8c/0x160 [ 560.593168] [<000000000072cfd4>] vm_munmap+0x14/0x40 [ 560.593190] [<00000000004402a8>] sys_64_munmap+0x88/0xa0 [ 560.593221] [<0000000000406274>] linux_sparc_syscall+0x34/0x44 [ 560.593274] ---[ end trace 0000000000000000 ]--- [ 560.593960] log_unaligned: 209 callbacks suppressed [ 560.593979] Kernel unaligned access at TPC[526a4c] __call_rcu_common.constprop.0+0x20c/0x760 [ 560.594121] Kernel unaligned access at TPC[526864] __call_rcu_common.constprop.0+0x24/0x760 [ 560.594198] Kernel unaligned access at TPC[52b3c4] rcu_segcblist_enqueue+0x24/0x40 [ 560.594275] Kernel unaligned access at TPC[526860] __call_rcu_common.constprop.0+0x20/0x760 [ 560.594360] Kernel unaligned access at TPC[526864] __call_rcu_common.constprop.0+0x24/0x760 [ 567.054127] log_unaligned: 1105 callbacks suppressed [ 567.054167] Kernel unaligned access at TPC[526860] __call_rcu_common.constprop.0+0x20/0x760 [ 567.054331] Kernel unaligned access at TPC[526864] __call_rcu_common.constprop.0+0x24/0x760 [ 567.054410] Kernel unaligned access at TPC[52b3c4] rcu_segcblist_enqueue+0x24/0x40 ... I bisected to this one on mm-unstable from approximately 2026-01-12. The warning is from /* Misaligned rcu_head! */ WARN_ON_ONCE((unsigned long)head & (sizeof(void *) - 1)); in __call_rcu_common() and the unaligned accesses follows from there. Regards, Andreas ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v3 7/7] mm: make PT_RECLAIM depends on MMU_GATHER_RCU_TABLE_FREE 2026-01-23 15:15 ` Andreas Larsson @ 2026-01-26 6:59 ` Qi Zheng 2026-01-27 11:29 ` David Hildenbrand (Red Hat) 0 siblings, 1 reply; 23+ messages in thread From: Qi Zheng @ 2026-01-26 6:59 UTC (permalink / raw) To: Andreas Larsson, david Cc: linux-arch, linux-kernel, linux-mm, linux-alpha, loongarch, linux-mips, linux-parisc, linux-um, Qi Zheng, sparclinux, will, peterz, akpm, aneesh.kumar, npiggin, dev.jain, ioworker0, linmag7 On 1/23/26 11:15 PM, Andreas Larsson wrote: > On 2025-12-17 10:45, Qi Zheng wrote: >> From: Qi Zheng <zhengqi.arch@bytedance.com> >> >> The PT_RECLAIM can work on all architectures that support >> MMU_GATHER_RCU_TABLE_FREE, so make PT_RECLAIM depends on >> MMU_GATHER_RCU_TABLE_FREE. >> >> BTW, change PT_RECLAIM to be enabled by default, since nobody should want >> to turn it off. >> >> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> >> --- >> arch/x86/Kconfig | 1 - >> mm/Kconfig | 9 ++------- >> 2 files changed, 2 insertions(+), 8 deletions(-) >> >> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig >> index 80527299f859a..0d22da56a71b0 100644 >> --- a/arch/x86/Kconfig >> +++ b/arch/x86/Kconfig >> @@ -331,7 +331,6 @@ config X86 >> select FUNCTION_ALIGNMENT_4B >> imply IMA_SECURE_AND_OR_TRUSTED_BOOT if EFI >> select HAVE_DYNAMIC_FTRACE_NO_PATCHABLE >> - select ARCH_SUPPORTS_PT_RECLAIM if X86_64 >> select ARCH_SUPPORTS_SCHED_SMT if SMP >> select SCHED_SMT if SMP >> select ARCH_SUPPORTS_SCHED_CLUSTER if SMP >> diff --git a/mm/Kconfig b/mm/Kconfig >> index bd0ea5454af82..fc00b429b7129 100644 >> --- a/mm/Kconfig >> +++ b/mm/Kconfig >> @@ -1447,14 +1447,9 @@ config ARCH_HAS_USER_SHADOW_STACK >> The architecture has hardware support for userspace shadow call >> stacks (eg, x86 CET, arm64 GCS or RISC-V Zicfiss). >> >> -config ARCH_SUPPORTS_PT_RECLAIM >> - def_bool n >> - >> config PT_RECLAIM >> - bool "reclaim empty user page table pages" >> - default y >> - depends on ARCH_SUPPORTS_PT_RECLAIM && MMU && SMP >> - select MMU_GATHER_RCU_TABLE_FREE >> + def_bool y >> + depends on MMU_GATHER_RCU_TABLE_FREE >> help >> Try to reclaim empty user page table pages in paths other than munmap >> and exit_mmap path. > > Hi, > > This patch unfortunately results in a WARN_ON_ONCE and unaligned > accesses on sparc64: > > $ stress-ng --mmaphuge 20 -t 60 > stress-ng: info: [559] setting to a 1 min run per stressor > stress-ng: info: [559] dispatching hogs: 20 mmaphuge > [ 560.592569] ------------[ cut here ]------------ > [ 560.592663] WARNING: kernel/rcu/tree.c:3098 at __call_rcu_common.constprop.0+0x200/0x760, CPU#4: stress-ng-mmaph/568 > [ 560.592777] CPU: 4 UID: 1000 PID: 568 Comm: stress-ng-mmaph Not tainted 6.19.0-rc5-00127-g62fc9f6ccb97 #8 VOLUNTARY > [ 560.592805] Call Trace: > [ 560.592812] [<00000000004368b8>] dump_stack+0x8/0x60 > [ 560.592844] [<0000000000482a60>] __warn+0xe0/0x140 > [ 560.592878] [<0000000000482b64>] warn_slowpath_fmt+0xa4/0x120 > [ 560.592901] [<0000000000526a40>] __call_rcu_common.constprop.0+0x200/0x760 > [ 560.592931] [<0000000000526fd0>] call_rcu+0x10/0x20 > [ 560.592954] [<0000000000730838>] tlb_remove_table+0x98/0xc0 > [ 560.592986] [<000000000071bec4>] free_pgd_range+0x224/0x4c0 > [ 560.593021] [<000000000071c35c>] free_pgtables+0x1fc/0x240 > [ 560.593042] [<000000000074a6f0>] vms_clear_ptes+0x110/0x140 > [ 560.593068] [<000000000074c3dc>] vms_complete_munmap_vmas+0x5c/0x280 > [ 560.593094] [<000000000074de5c>] do_vmi_align_munmap+0x1dc/0x260 > [ 560.593117] [<000000000074df80>] do_vmi_munmap+0xa0/0x140 > [ 560.593142] [<000000000074fb2c>] __vm_munmap+0x8c/0x160 > [ 560.593168] [<000000000072cfd4>] vm_munmap+0x14/0x40 > [ 560.593190] [<00000000004402a8>] sys_64_munmap+0x88/0xa0 > [ 560.593221] [<0000000000406274>] linux_sparc_syscall+0x34/0x44 > [ 560.593274] ---[ end trace 0000000000000000 ]--- > [ 560.593960] log_unaligned: 209 callbacks suppressed > [ 560.593979] Kernel unaligned access at TPC[526a4c] __call_rcu_common.constprop.0+0x20c/0x760 > [ 560.594121] Kernel unaligned access at TPC[526864] __call_rcu_common.constprop.0+0x24/0x760 > [ 560.594198] Kernel unaligned access at TPC[52b3c4] rcu_segcblist_enqueue+0x24/0x40 > [ 560.594275] Kernel unaligned access at TPC[526860] __call_rcu_common.constprop.0+0x20/0x760 > [ 560.594360] Kernel unaligned access at TPC[526864] __call_rcu_common.constprop.0+0x24/0x760 > [ 567.054127] log_unaligned: 1105 callbacks suppressed > [ 567.054167] Kernel unaligned access at TPC[526860] __call_rcu_common.constprop.0+0x20/0x760 > [ 567.054331] Kernel unaligned access at TPC[526864] __call_rcu_common.constprop.0+0x24/0x760 > [ 567.054410] Kernel unaligned access at TPC[52b3c4] rcu_segcblist_enqueue+0x24/0x40 Thanks for your report! On sparc64, pmd and pud levels are not of struct page: __pmd_free_tlb/__pud_free_tlb --> pgtable_free_tlb(tlb, pud/pmd, false). <=== is_page == false --> tlb_remove_table So in __tlb_remove_table_one(), the table cannot be treated as ptdesc because it does not have an pt_rcu_head member. Hi David, it seems we still need to keep ARCH_SUPPORTS_PT_RECLAIM? Thanks, Qi > ... > > I bisected to this one on mm-unstable from approximately 2026-01-12. > > The warning is from > > /* Misaligned rcu_head! */ > WARN_ON_ONCE((unsigned long)head & (sizeof(void *) - 1)); > > in __call_rcu_common() and the unaligned accesses follows from there. > > Regards, > Andreas > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v3 7/7] mm: make PT_RECLAIM depends on MMU_GATHER_RCU_TABLE_FREE 2026-01-26 6:59 ` Qi Zheng @ 2026-01-27 11:29 ` David Hildenbrand (Red Hat) 2026-01-27 11:47 ` Qi Zheng 0 siblings, 1 reply; 23+ messages in thread From: David Hildenbrand (Red Hat) @ 2026-01-27 11:29 UTC (permalink / raw) To: Qi Zheng, Andreas Larsson Cc: linux-arch, linux-kernel, linux-mm, linux-alpha, loongarch, linux-mips, linux-parisc, linux-um, Qi Zheng, sparclinux, will, peterz, akpm, aneesh.kumar, npiggin, dev.jain, ioworker0, linmag7 On 1/26/26 07:59, Qi Zheng wrote: > > > On 1/23/26 11:15 PM, Andreas Larsson wrote: >> On 2025-12-17 10:45, Qi Zheng wrote: >>> From: Qi Zheng <zhengqi.arch@bytedance.com> >>> >>> The PT_RECLAIM can work on all architectures that support >>> MMU_GATHER_RCU_TABLE_FREE, so make PT_RECLAIM depends on >>> MMU_GATHER_RCU_TABLE_FREE. >>> >>> BTW, change PT_RECLAIM to be enabled by default, since nobody should want >>> to turn it off. >>> >>> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> >>> --- >>> arch/x86/Kconfig | 1 - >>> mm/Kconfig | 9 ++------- >>> 2 files changed, 2 insertions(+), 8 deletions(-) >>> >>> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig >>> index 80527299f859a..0d22da56a71b0 100644 >>> --- a/arch/x86/Kconfig >>> +++ b/arch/x86/Kconfig >>> @@ -331,7 +331,6 @@ config X86 >>> select FUNCTION_ALIGNMENT_4B >>> imply IMA_SECURE_AND_OR_TRUSTED_BOOT if EFI >>> select HAVE_DYNAMIC_FTRACE_NO_PATCHABLE >>> - select ARCH_SUPPORTS_PT_RECLAIM if X86_64 >>> select ARCH_SUPPORTS_SCHED_SMT if SMP >>> select SCHED_SMT if SMP >>> select ARCH_SUPPORTS_SCHED_CLUSTER if SMP >>> diff --git a/mm/Kconfig b/mm/Kconfig >>> index bd0ea5454af82..fc00b429b7129 100644 >>> --- a/mm/Kconfig >>> +++ b/mm/Kconfig >>> @@ -1447,14 +1447,9 @@ config ARCH_HAS_USER_SHADOW_STACK >>> The architecture has hardware support for userspace shadow call >>> stacks (eg, x86 CET, arm64 GCS or RISC-V Zicfiss). >>> >>> -config ARCH_SUPPORTS_PT_RECLAIM >>> - def_bool n >>> - >>> config PT_RECLAIM >>> - bool "reclaim empty user page table pages" >>> - default y >>> - depends on ARCH_SUPPORTS_PT_RECLAIM && MMU && SMP >>> - select MMU_GATHER_RCU_TABLE_FREE >>> + def_bool y >>> + depends on MMU_GATHER_RCU_TABLE_FREE >>> help >>> Try to reclaim empty user page table pages in paths other than munmap >>> and exit_mmap path. >> >> Hi, >> >> This patch unfortunately results in a WARN_ON_ONCE and unaligned >> accesses on sparc64: >> >> $ stress-ng --mmaphuge 20 -t 60 >> stress-ng: info: [559] setting to a 1 min run per stressor >> stress-ng: info: [559] dispatching hogs: 20 mmaphuge >> [ 560.592569] ------------[ cut here ]------------ >> [ 560.592663] WARNING: kernel/rcu/tree.c:3098 at __call_rcu_common.constprop.0+0x200/0x760, CPU#4: stress-ng-mmaph/568 >> [ 560.592777] CPU: 4 UID: 1000 PID: 568 Comm: stress-ng-mmaph Not tainted 6.19.0-rc5-00127-g62fc9f6ccb97 #8 VOLUNTARY >> [ 560.592805] Call Trace: >> [ 560.592812] [<00000000004368b8>] dump_stack+0x8/0x60 >> [ 560.592844] [<0000000000482a60>] __warn+0xe0/0x140 >> [ 560.592878] [<0000000000482b64>] warn_slowpath_fmt+0xa4/0x120 >> [ 560.592901] [<0000000000526a40>] __call_rcu_common.constprop.0+0x200/0x760 >> [ 560.592931] [<0000000000526fd0>] call_rcu+0x10/0x20 >> [ 560.592954] [<0000000000730838>] tlb_remove_table+0x98/0xc0 >> [ 560.592986] [<000000000071bec4>] free_pgd_range+0x224/0x4c0 >> [ 560.593021] [<000000000071c35c>] free_pgtables+0x1fc/0x240 >> [ 560.593042] [<000000000074a6f0>] vms_clear_ptes+0x110/0x140 >> [ 560.593068] [<000000000074c3dc>] vms_complete_munmap_vmas+0x5c/0x280 >> [ 560.593094] [<000000000074de5c>] do_vmi_align_munmap+0x1dc/0x260 >> [ 560.593117] [<000000000074df80>] do_vmi_munmap+0xa0/0x140 >> [ 560.593142] [<000000000074fb2c>] __vm_munmap+0x8c/0x160 >> [ 560.593168] [<000000000072cfd4>] vm_munmap+0x14/0x40 >> [ 560.593190] [<00000000004402a8>] sys_64_munmap+0x88/0xa0 >> [ 560.593221] [<0000000000406274>] linux_sparc_syscall+0x34/0x44 >> [ 560.593274] ---[ end trace 0000000000000000 ]--- >> [ 560.593960] log_unaligned: 209 callbacks suppressed >> [ 560.593979] Kernel unaligned access at TPC[526a4c] __call_rcu_common.constprop.0+0x20c/0x760 >> [ 560.594121] Kernel unaligned access at TPC[526864] __call_rcu_common.constprop.0+0x24/0x760 >> [ 560.594198] Kernel unaligned access at TPC[52b3c4] rcu_segcblist_enqueue+0x24/0x40 >> [ 560.594275] Kernel unaligned access at TPC[526860] __call_rcu_common.constprop.0+0x20/0x760 >> [ 560.594360] Kernel unaligned access at TPC[526864] __call_rcu_common.constprop.0+0x24/0x760 >> [ 567.054127] log_unaligned: 1105 callbacks suppressed >> [ 567.054167] Kernel unaligned access at TPC[526860] __call_rcu_common.constprop.0+0x20/0x760 >> [ 567.054331] Kernel unaligned access at TPC[526864] __call_rcu_common.constprop.0+0x24/0x760 >> [ 567.054410] Kernel unaligned access at TPC[52b3c4] rcu_segcblist_enqueue+0x24/0x40 > > Thanks for your report! > > On sparc64, pmd and pud levels are not of struct page: Can you elaborate, I don't understand what you mean :) Is it also a problem on architectures like s390x and ppc, where we squeeze multiple page tables into a physical pages? > > __pmd_free_tlb/__pud_free_tlb > --> pgtable_free_tlb(tlb, pud/pmd, false). <=== is_page == false > --> tlb_remove_table > > So in __tlb_remove_table_one(), the table cannot be treated as > ptdesc because it does not have an pt_rcu_head member. > > Hi David, it seems we still need to keep ARCH_SUPPORTS_PT_RECLAIM? Or we invert it and only disable it for the known-problematic architectures? -- Cheers David ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v3 7/7] mm: make PT_RECLAIM depends on MMU_GATHER_RCU_TABLE_FREE 2026-01-27 11:29 ` David Hildenbrand (Red Hat) @ 2026-01-27 11:47 ` Qi Zheng 0 siblings, 0 replies; 23+ messages in thread From: Qi Zheng @ 2026-01-27 11:47 UTC (permalink / raw) To: David Hildenbrand (Red Hat), Andreas Larsson Cc: linux-arch, linux-kernel, linux-mm, linux-alpha, loongarch, linux-mips, linux-parisc, linux-um, Qi Zheng, sparclinux, will, peterz, akpm, aneesh.kumar, npiggin, dev.jain, ioworker0, linmag7 On 1/27/26 7:29 PM, David Hildenbrand (Red Hat) wrote: > On 1/26/26 07:59, Qi Zheng wrote: >> >> >> On 1/23/26 11:15 PM, Andreas Larsson wrote: >>> On 2025-12-17 10:45, Qi Zheng wrote: >>>> From: Qi Zheng <zhengqi.arch@bytedance.com> >>>> >>>> The PT_RECLAIM can work on all architectures that support >>>> MMU_GATHER_RCU_TABLE_FREE, so make PT_RECLAIM depends on >>>> MMU_GATHER_RCU_TABLE_FREE. >>>> >>>> BTW, change PT_RECLAIM to be enabled by default, since nobody should >>>> want >>>> to turn it off. >>>> >>>> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> >>>> --- >>>> arch/x86/Kconfig | 1 - >>>> mm/Kconfig | 9 ++------- >>>> 2 files changed, 2 insertions(+), 8 deletions(-) >>>> >>>> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig >>>> index 80527299f859a..0d22da56a71b0 100644 >>>> --- a/arch/x86/Kconfig >>>> +++ b/arch/x86/Kconfig >>>> @@ -331,7 +331,6 @@ config X86 >>>> select FUNCTION_ALIGNMENT_4B >>>> imply IMA_SECURE_AND_OR_TRUSTED_BOOT if EFI >>>> select HAVE_DYNAMIC_FTRACE_NO_PATCHABLE >>>> - select ARCH_SUPPORTS_PT_RECLAIM if X86_64 >>>> select ARCH_SUPPORTS_SCHED_SMT if SMP >>>> select SCHED_SMT if SMP >>>> select ARCH_SUPPORTS_SCHED_CLUSTER if SMP >>>> diff --git a/mm/Kconfig b/mm/Kconfig >>>> index bd0ea5454af82..fc00b429b7129 100644 >>>> --- a/mm/Kconfig >>>> +++ b/mm/Kconfig >>>> @@ -1447,14 +1447,9 @@ config ARCH_HAS_USER_SHADOW_STACK >>>> The architecture has hardware support for userspace shadow >>>> call >>>> stacks (eg, x86 CET, arm64 GCS or RISC-V Zicfiss). >>>> -config ARCH_SUPPORTS_PT_RECLAIM >>>> - def_bool n >>>> - >>>> config PT_RECLAIM >>>> - bool "reclaim empty user page table pages" >>>> - default y >>>> - depends on ARCH_SUPPORTS_PT_RECLAIM && MMU && SMP >>>> - select MMU_GATHER_RCU_TABLE_FREE >>>> + def_bool y >>>> + depends on MMU_GATHER_RCU_TABLE_FREE >>>> help >>>> Try to reclaim empty user page table pages in paths other >>>> than munmap >>>> and exit_mmap path. >>> >>> Hi, >>> >>> This patch unfortunately results in a WARN_ON_ONCE and unaligned >>> accesses on sparc64: >>> >>> $ stress-ng --mmaphuge 20 -t 60 >>> stress-ng: info: [559] setting to a 1 min run per stressor >>> stress-ng: info: [559] dispatching hogs: 20 mmaphuge >>> [ 560.592569] ------------[ cut here ]------------ >>> [ 560.592663] WARNING: kernel/rcu/tree.c:3098 at >>> __call_rcu_common.constprop.0+0x200/0x760, CPU#4: stress-ng-mmaph/568 >>> [ 560.592777] CPU: 4 UID: 1000 PID: 568 Comm: stress-ng-mmaph Not >>> tainted 6.19.0-rc5-00127-g62fc9f6ccb97 #8 VOLUNTARY >>> [ 560.592805] Call Trace: >>> [ 560.592812] [<00000000004368b8>] dump_stack+0x8/0x60 >>> [ 560.592844] [<0000000000482a60>] __warn+0xe0/0x140 >>> [ 560.592878] [<0000000000482b64>] warn_slowpath_fmt+0xa4/0x120 >>> [ 560.592901] [<0000000000526a40>] >>> __call_rcu_common.constprop.0+0x200/0x760 >>> [ 560.592931] [<0000000000526fd0>] call_rcu+0x10/0x20 >>> [ 560.592954] [<0000000000730838>] tlb_remove_table+0x98/0xc0 >>> [ 560.592986] [<000000000071bec4>] free_pgd_range+0x224/0x4c0 >>> [ 560.593021] [<000000000071c35c>] free_pgtables+0x1fc/0x240 >>> [ 560.593042] [<000000000074a6f0>] vms_clear_ptes+0x110/0x140 >>> [ 560.593068] [<000000000074c3dc>] vms_complete_munmap_vmas+0x5c/0x280 >>> [ 560.593094] [<000000000074de5c>] do_vmi_align_munmap+0x1dc/0x260 >>> [ 560.593117] [<000000000074df80>] do_vmi_munmap+0xa0/0x140 >>> [ 560.593142] [<000000000074fb2c>] __vm_munmap+0x8c/0x160 >>> [ 560.593168] [<000000000072cfd4>] vm_munmap+0x14/0x40 >>> [ 560.593190] [<00000000004402a8>] sys_64_munmap+0x88/0xa0 >>> [ 560.593221] [<0000000000406274>] linux_sparc_syscall+0x34/0x44 >>> [ 560.593274] ---[ end trace 0000000000000000 ]--- >>> [ 560.593960] log_unaligned: 209 callbacks suppressed >>> [ 560.593979] Kernel unaligned access at TPC[526a4c] >>> __call_rcu_common.constprop.0+0x20c/0x760 >>> [ 560.594121] Kernel unaligned access at TPC[526864] >>> __call_rcu_common.constprop.0+0x24/0x760 >>> [ 560.594198] Kernel unaligned access at TPC[52b3c4] >>> rcu_segcblist_enqueue+0x24/0x40 >>> [ 560.594275] Kernel unaligned access at TPC[526860] >>> __call_rcu_common.constprop.0+0x20/0x760 >>> [ 560.594360] Kernel unaligned access at TPC[526864] >>> __call_rcu_common.constprop.0+0x24/0x760 >>> [ 567.054127] log_unaligned: 1105 callbacks suppressed >>> [ 567.054167] Kernel unaligned access at TPC[526860] >>> __call_rcu_common.constprop.0+0x20/0x760 >>> [ 567.054331] Kernel unaligned access at TPC[526864] >>> __call_rcu_common.constprop.0+0x24/0x760 >>> [ 567.054410] Kernel unaligned access at TPC[52b3c4] >>> rcu_segcblist_enqueue+0x24/0x40 >> >> Thanks for your report! >> >> On sparc64, pmd and pud levels are not of struct page: > > Can you elaborate, I don't understand what you mean :) On sparc64: static inline void pgtable_free_tlb(struct mmu_gather *tlb, void *table, bool is_page) { unsigned long pgf = (unsigned long)table; if (is_page) pgf |= 0x1UL; tlb_remove_table(tlb, (void *)pgf); } static inline void __tlb_remove_table(void *_table) { void *table = (void *)((unsigned long)_table & ~0x1UL); bool is_page = false; if ((unsigned long)_table & 0x1UL) is_page = true; pgtable_free(table, is_page); } void pgtable_free(void *table, bool is_page) { if (is_page) __pte_free(table); else kmem_cache_free(pgtable_cache, table); } For pmd and pud levels, is_page is false, so we can not do the following in __tlb_remove_table_one(). ``` ptdesc = table; call_rcu(&ptdesc->pt_rcu_head, __tlb_remove_table_one_rcu); ``` > > Is it also a problem on architectures like s390x and ppc, where we > squeeze multiple page tables into a physical pages? For ppc, it's the same as for sparc64. For s390x, it supports MMU_GATHER_RCU_TABLE_FREE and define its own pxx_free_tlb(), but these all call tlb_remove_ptdesc(), so there is no problem. > >> >> __pmd_free_tlb/__pud_free_tlb >> --> pgtable_free_tlb(tlb, pud/pmd, false). <=== is_page == false >> --> tlb_remove_table >> >> So in __tlb_remove_table_one(), the table cannot be treated as >> ptdesc because it does not have an pt_rcu_head member. >> >> Hi David, it seems we still need to keep ARCH_SUPPORTS_PT_RECLAIM? > > Or we invert it and only disable it for the known-problematic > architectures? Yes, the problem lies with those architectures that support MMU_GATHER_RCU_TABLE_FREE and define their own _tlb_remove_table(). So my plan is as follows: 1. convert __HAVE_ARCH_TLB_REMOVE_TABLE to CONFIG_HAVE_ARCH_TLB_REMOVE_TABLE config 2. make PT_RECLAIM depends on MMU_GATHER_RCU_TABLE_FREE && !HAVE_ARCH_TLB_REMOVE_TABLE I'll send v4 soon. Thanks, Qi > ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2026-01-27 11:47 UTC | newest] Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2025-12-17 9:45 [PATCH v3 0/7] enable PT_RECLAIM on all 64-bit architectures Qi Zheng 2025-12-17 9:45 ` [PATCH v3 1/7] mm: change mm/pt_reclaim.c to use asm/tlb.h instead of asm-generic/tlb.h Qi Zheng 2025-12-17 9:45 ` [PATCH v3 2/7] alpha: mm: enable MMU_GATHER_RCU_TABLE_FREE Qi Zheng 2025-12-17 9:45 ` [PATCH v3 3/7] LoongArch: " Qi Zheng 2025-12-17 9:45 ` [PATCH v3 4/7] mips: " Qi Zheng 2025-12-17 9:45 ` [PATCH v3 5/7] parisc: " Qi Zheng 2025-12-17 9:45 ` [PATCH v3 6/7] um: " Qi Zheng 2025-12-17 9:45 ` [PATCH v3 7/7] mm: make PT_RECLAIM depends on MMU_GATHER_RCU_TABLE_FREE Qi Zheng 2025-12-31 9:42 ` Wei Yang 2025-12-31 9:52 ` Qi Zheng 2026-01-01 2:07 ` Wei Yang 2026-01-19 10:18 ` David Hildenbrand (Red Hat) 2026-01-22 14:00 ` Wei Yang 2026-01-23 3:21 ` Qi Zheng 2026-01-24 1:45 ` Wei Yang 2026-01-18 11:23 ` David Hildenbrand (Red Hat) 2026-01-19 3:50 ` Qi Zheng 2026-01-19 10:12 ` David Hildenbrand (Red Hat) 2026-01-19 10:20 ` David Hildenbrand (Red Hat) 2026-01-23 15:15 ` Andreas Larsson 2026-01-26 6:59 ` Qi Zheng 2026-01-27 11:29 ` David Hildenbrand (Red Hat) 2026-01-27 11:47 ` Qi Zheng
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox