* [PATCH v2 0/4] Fixes for hugetlb and vmalloc on arm64
@ 2025-02-17 14:04 Ryan Roberts
2025-02-17 14:04 ` [PATCH v2 1/4] mm: hugetlb: Add huge page size param to huge_ptep_get_and_clear() Ryan Roberts
` (4 more replies)
0 siblings, 5 replies; 30+ messages in thread
From: Ryan Roberts @ 2025-02-17 14:04 UTC (permalink / raw)
To: Catalin Marinas, Will Deacon, Huacai Chen, WANG Xuerui,
Thomas Bogendoerfer, James E.J. Bottomley, Helge Deller,
Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
Christophe Leroy, Naveen N Rao, Paul Walmsley, Palmer Dabbelt,
Albert Ou, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, Gerald Schaefer,
David S. Miller, Andreas Larsson, Arnd Bergmann, Muchun Song,
Andrew Morton, Uladzislau Rezki, Christoph Hellwig,
David Hildenbrand, Matthew Wilcox (Oracle),
Mark Rutland, Anshuman Khandual, Dev Jain, Kevin Brodsky,
Alexandre Ghiti
Cc: Ryan Roberts, linux-arm-kernel, linux-mm, linux-kernel
Hi All,
This series contains some fixes for hugetlb on arm64, and is split out from v1
of a wider series at [1]. While the last patch is technically targetting core-mm
and is not directly related to arm64, I'd like to to go via the arm64 tree so
that the wider performance improvement series (v2 to be posted shortly) that
depends on this series doesn't have to be robust to the fix not being present.
I've included maintainers/reviewers for all the arches that are (trivially)
touched due to the API changes, hoping for some ACKs.
Changes since v1 [1]
====================
- Added Rb from Anshuman - Thanks!
- Added "#ifndef __PAGETABLE_PMD_FOLDED" around PUD_SIZE in flush_hugetlb_tlb_range()
I've marked all of these as candidates for backport to stable.
Applies on top of v6.14-rc3. All mm selftests run and pass.
[1] https://lore.kernel.org/linux-arm-kernel/20250205151003.88959-1-ryan.roberts@arm.com/
Thanks,
Ryan
Ryan Roberts (4):
mm: hugetlb: Add huge page size param to huge_ptep_get_and_clear()
arm64: hugetlb: Fix huge_ptep_get_and_clear() for non-present ptes
arm64: hugetlb: Fix flush_hugetlb_tlb_range() invalidation level
mm: Don't skip arch_sync_kernel_mappings() in error paths
arch/arm64/include/asm/hugetlb.h | 26 ++++++++++-----
arch/arm64/mm/hugetlbpage.c | 48 +++++++++++++---------------
arch/loongarch/include/asm/hugetlb.h | 6 ++--
arch/mips/include/asm/hugetlb.h | 6 ++--
arch/parisc/include/asm/hugetlb.h | 2 +-
arch/parisc/mm/hugetlbpage.c | 2 +-
arch/powerpc/include/asm/hugetlb.h | 6 ++--
arch/riscv/include/asm/hugetlb.h | 3 +-
arch/riscv/mm/hugetlbpage.c | 2 +-
arch/s390/include/asm/hugetlb.h | 12 ++++---
arch/s390/mm/hugetlbpage.c | 10 ++++--
arch/sparc/include/asm/hugetlb.h | 2 +-
arch/sparc/mm/hugetlbpage.c | 2 +-
include/asm-generic/hugetlb.h | 2 +-
include/linux/hugetlb.h | 4 ++-
mm/hugetlb.c | 4 +--
mm/memory.c | 6 ++--
mm/vmalloc.c | 4 +--
18 files changed, 87 insertions(+), 60 deletions(-)
--
2.43.0
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v2 1/4] mm: hugetlb: Add huge page size param to huge_ptep_get_and_clear()
2025-02-17 14:04 [PATCH v2 0/4] Fixes for hugetlb and vmalloc on arm64 Ryan Roberts
@ 2025-02-17 14:04 ` Ryan Roberts
2025-02-19 8:28 ` Anshuman Khandual
` (4 more replies)
2025-02-17 14:04 ` [PATCH v2 2/4] arm64: hugetlb: Fix huge_ptep_get_and_clear() for non-present ptes Ryan Roberts
` (3 subsequent siblings)
4 siblings, 5 replies; 30+ messages in thread
From: Ryan Roberts @ 2025-02-17 14:04 UTC (permalink / raw)
To: Catalin Marinas, Will Deacon, Huacai Chen, WANG Xuerui,
Thomas Bogendoerfer, James E.J. Bottomley, Helge Deller,
Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
Christophe Leroy, Naveen N Rao, Paul Walmsley, Palmer Dabbelt,
Albert Ou, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, Gerald Schaefer,
David S. Miller, Andreas Larsson, Arnd Bergmann, Muchun Song,
Andrew Morton, Uladzislau Rezki, Christoph Hellwig,
David Hildenbrand, Matthew Wilcox (Oracle),
Mark Rutland, Anshuman Khandual, Dev Jain, Kevin Brodsky,
Alexandre Ghiti
Cc: Ryan Roberts, linux-arm-kernel, linux-mm, linux-kernel, stable
In order to fix a bug, arm64 needs to be told the size of the huge page
for which the huge_pte is being set in huge_ptep_get_and_clear().
Provide for this by adding an `unsigned long sz` parameter to the
function. This follows the same pattern as huge_pte_clear() and
set_huge_pte_at().
This commit makes the required interface modifications to the core mm as
well as all arches that implement this function (arm64, loongarch, mips,
parisc, powerpc, riscv, s390, sparc). The actual arm64 bug will be fixed
in a separate commit.
Cc: stable@vger.kernel.org
Fixes: 66b3923a1a0f ("arm64: hugetlb: add support for PTE contiguous bit")
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
arch/arm64/include/asm/hugetlb.h | 4 ++--
arch/arm64/mm/hugetlbpage.c | 8 +++++---
arch/loongarch/include/asm/hugetlb.h | 6 ++++--
arch/mips/include/asm/hugetlb.h | 6 ++++--
arch/parisc/include/asm/hugetlb.h | 2 +-
arch/parisc/mm/hugetlbpage.c | 2 +-
arch/powerpc/include/asm/hugetlb.h | 6 ++++--
arch/riscv/include/asm/hugetlb.h | 3 ++-
arch/riscv/mm/hugetlbpage.c | 2 +-
arch/s390/include/asm/hugetlb.h | 12 ++++++++----
arch/s390/mm/hugetlbpage.c | 10 ++++++++--
arch/sparc/include/asm/hugetlb.h | 2 +-
arch/sparc/mm/hugetlbpage.c | 2 +-
include/asm-generic/hugetlb.h | 2 +-
include/linux/hugetlb.h | 4 +++-
mm/hugetlb.c | 4 ++--
16 files changed, 48 insertions(+), 27 deletions(-)
diff --git a/arch/arm64/include/asm/hugetlb.h b/arch/arm64/include/asm/hugetlb.h
index c6dff3e69539..03db9cb21ace 100644
--- a/arch/arm64/include/asm/hugetlb.h
+++ b/arch/arm64/include/asm/hugetlb.h
@@ -42,8 +42,8 @@ extern int huge_ptep_set_access_flags(struct vm_area_struct *vma,
unsigned long addr, pte_t *ptep,
pte_t pte, int dirty);
#define __HAVE_ARCH_HUGE_PTEP_GET_AND_CLEAR
-extern pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
- unsigned long addr, pte_t *ptep);
+extern pte_t huge_ptep_get_and_clear(struct mm_struct *mm, unsigned long addr,
+ pte_t *ptep, unsigned long sz);
#define __HAVE_ARCH_HUGE_PTEP_SET_WRPROTECT
extern void huge_ptep_set_wrprotect(struct mm_struct *mm,
unsigned long addr, pte_t *ptep);
diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index 98a2a0e64e25..06db4649af91 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -396,8 +396,8 @@ void huge_pte_clear(struct mm_struct *mm, unsigned long addr,
__pte_clear(mm, addr, ptep);
}
-pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
- unsigned long addr, pte_t *ptep)
+pte_t huge_ptep_get_and_clear(struct mm_struct *mm, unsigned long addr,
+ pte_t *ptep, unsigned long sz)
{
int ncontig;
size_t pgsize;
@@ -549,6 +549,8 @@ bool __init arch_hugetlb_valid_size(unsigned long size)
pte_t huge_ptep_modify_prot_start(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep)
{
+ unsigned long psize = huge_page_size(hstate_vma(vma));
+
if (alternative_has_cap_unlikely(ARM64_WORKAROUND_2645198)) {
/*
* Break-before-make (BBM) is required for all user space mappings
@@ -558,7 +560,7 @@ pte_t huge_ptep_modify_prot_start(struct vm_area_struct *vma, unsigned long addr
if (pte_user_exec(__ptep_get(ptep)))
return huge_ptep_clear_flush(vma, addr, ptep);
}
- return huge_ptep_get_and_clear(vma->vm_mm, addr, ptep);
+ return huge_ptep_get_and_clear(vma->vm_mm, addr, ptep, psize);
}
void huge_ptep_modify_prot_commit(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep,
diff --git a/arch/loongarch/include/asm/hugetlb.h b/arch/loongarch/include/asm/hugetlb.h
index c8e4057734d0..4dc4b3e04225 100644
--- a/arch/loongarch/include/asm/hugetlb.h
+++ b/arch/loongarch/include/asm/hugetlb.h
@@ -36,7 +36,8 @@ static inline void huge_pte_clear(struct mm_struct *mm, unsigned long addr,
#define __HAVE_ARCH_HUGE_PTEP_GET_AND_CLEAR
static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
- unsigned long addr, pte_t *ptep)
+ unsigned long addr, pte_t *ptep,
+ unsigned long sz)
{
pte_t clear;
pte_t pte = ptep_get(ptep);
@@ -51,8 +52,9 @@ static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
unsigned long addr, pte_t *ptep)
{
pte_t pte;
+ unsigned long sz = huge_page_size(hstate_vma(vma));
- pte = huge_ptep_get_and_clear(vma->vm_mm, addr, ptep);
+ pte = huge_ptep_get_and_clear(vma->vm_mm, addr, ptep, sz);
flush_tlb_page(vma, addr);
return pte;
}
diff --git a/arch/mips/include/asm/hugetlb.h b/arch/mips/include/asm/hugetlb.h
index d0a86ce83de9..fbc71ddcf0f6 100644
--- a/arch/mips/include/asm/hugetlb.h
+++ b/arch/mips/include/asm/hugetlb.h
@@ -27,7 +27,8 @@ static inline int prepare_hugepage_range(struct file *file,
#define __HAVE_ARCH_HUGE_PTEP_GET_AND_CLEAR
static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
- unsigned long addr, pte_t *ptep)
+ unsigned long addr, pte_t *ptep,
+ unsigned long sz)
{
pte_t clear;
pte_t pte = *ptep;
@@ -42,13 +43,14 @@ static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
unsigned long addr, pte_t *ptep)
{
pte_t pte;
+ unsigned long sz = huge_page_size(hstate_vma(vma));
/*
* clear the huge pte entry firstly, so that the other smp threads will
* not get old pte entry after finishing flush_tlb_page and before
* setting new huge pte entry
*/
- pte = huge_ptep_get_and_clear(vma->vm_mm, addr, ptep);
+ pte = huge_ptep_get_and_clear(vma->vm_mm, addr, ptep, sz);
flush_tlb_page(vma, addr);
return pte;
}
diff --git a/arch/parisc/include/asm/hugetlb.h b/arch/parisc/include/asm/hugetlb.h
index 5b3a5429f71b..21e9ace17739 100644
--- a/arch/parisc/include/asm/hugetlb.h
+++ b/arch/parisc/include/asm/hugetlb.h
@@ -10,7 +10,7 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
#define __HAVE_ARCH_HUGE_PTEP_GET_AND_CLEAR
pte_t huge_ptep_get_and_clear(struct mm_struct *mm, unsigned long addr,
- pte_t *ptep);
+ pte_t *ptep, unsigned long sz);
#define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH
static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
diff --git a/arch/parisc/mm/hugetlbpage.c b/arch/parisc/mm/hugetlbpage.c
index e9d18cf25b79..a94fe546d434 100644
--- a/arch/parisc/mm/hugetlbpage.c
+++ b/arch/parisc/mm/hugetlbpage.c
@@ -126,7 +126,7 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
pte_t huge_ptep_get_and_clear(struct mm_struct *mm, unsigned long addr,
- pte_t *ptep)
+ pte_t *ptep, unsigned long sz)
{
pte_t entry;
diff --git a/arch/powerpc/include/asm/hugetlb.h b/arch/powerpc/include/asm/hugetlb.h
index dad2e7980f24..86326587e58d 100644
--- a/arch/powerpc/include/asm/hugetlb.h
+++ b/arch/powerpc/include/asm/hugetlb.h
@@ -45,7 +45,8 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
#define __HAVE_ARCH_HUGE_PTEP_GET_AND_CLEAR
static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
- unsigned long addr, pte_t *ptep)
+ unsigned long addr, pte_t *ptep,
+ unsigned long sz)
{
return __pte(pte_update(mm, addr, ptep, ~0UL, 0, 1));
}
@@ -55,8 +56,9 @@ static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
unsigned long addr, pte_t *ptep)
{
pte_t pte;
+ unsigned long sz = huge_page_size(hstate_vma(vma));
- pte = huge_ptep_get_and_clear(vma->vm_mm, addr, ptep);
+ pte = huge_ptep_get_and_clear(vma->vm_mm, addr, ptep, sz);
flush_hugetlb_page(vma, addr);
return pte;
}
diff --git a/arch/riscv/include/asm/hugetlb.h b/arch/riscv/include/asm/hugetlb.h
index faf3624d8057..446126497768 100644
--- a/arch/riscv/include/asm/hugetlb.h
+++ b/arch/riscv/include/asm/hugetlb.h
@@ -28,7 +28,8 @@ void set_huge_pte_at(struct mm_struct *mm,
#define __HAVE_ARCH_HUGE_PTEP_GET_AND_CLEAR
pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
- unsigned long addr, pte_t *ptep);
+ unsigned long addr, pte_t *ptep,
+ unsigned long sz);
#define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH
pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
diff --git a/arch/riscv/mm/hugetlbpage.c b/arch/riscv/mm/hugetlbpage.c
index 42314f093922..b4a78a4b35cf 100644
--- a/arch/riscv/mm/hugetlbpage.c
+++ b/arch/riscv/mm/hugetlbpage.c
@@ -293,7 +293,7 @@ int huge_ptep_set_access_flags(struct vm_area_struct *vma,
pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
unsigned long addr,
- pte_t *ptep)
+ pte_t *ptep, unsigned long sz)
{
pte_t orig_pte = ptep_get(ptep);
int pte_num;
diff --git a/arch/s390/include/asm/hugetlb.h b/arch/s390/include/asm/hugetlb.h
index 7c52acaf9f82..420c74306779 100644
--- a/arch/s390/include/asm/hugetlb.h
+++ b/arch/s390/include/asm/hugetlb.h
@@ -26,7 +26,11 @@ void __set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
pte_t huge_ptep_get(struct mm_struct *mm, unsigned long addr, pte_t *ptep);
#define __HAVE_ARCH_HUGE_PTEP_GET_AND_CLEAR
-pte_t huge_ptep_get_and_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep);
+pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
+ unsigned long addr, pte_t *ptep,
+ unsigned long sz);
+pte_t __huge_ptep_get_and_clear(struct mm_struct *mm,
+ unsigned long addr, pte_t *ptep);
static inline void arch_clear_hugetlb_flags(struct folio *folio)
{
@@ -48,7 +52,7 @@ static inline void huge_pte_clear(struct mm_struct *mm, unsigned long addr,
static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
unsigned long address, pte_t *ptep)
{
- return huge_ptep_get_and_clear(vma->vm_mm, address, ptep);
+ return __huge_ptep_get_and_clear(vma->vm_mm, address, ptep);
}
#define __HAVE_ARCH_HUGE_PTEP_SET_ACCESS_FLAGS
@@ -59,7 +63,7 @@ static inline int huge_ptep_set_access_flags(struct vm_area_struct *vma,
int changed = !pte_same(huge_ptep_get(vma->vm_mm, addr, ptep), pte);
if (changed) {
- huge_ptep_get_and_clear(vma->vm_mm, addr, ptep);
+ __huge_ptep_get_and_clear(vma->vm_mm, addr, ptep);
__set_huge_pte_at(vma->vm_mm, addr, ptep, pte);
}
return changed;
@@ -69,7 +73,7 @@ static inline int huge_ptep_set_access_flags(struct vm_area_struct *vma,
static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,
unsigned long addr, pte_t *ptep)
{
- pte_t pte = huge_ptep_get_and_clear(mm, addr, ptep);
+ pte_t pte = __huge_ptep_get_and_clear(mm, addr, ptep);
__set_huge_pte_at(mm, addr, ptep, pte_wrprotect(pte));
}
diff --git a/arch/s390/mm/hugetlbpage.c b/arch/s390/mm/hugetlbpage.c
index d9ce199953de..52ee8e854195 100644
--- a/arch/s390/mm/hugetlbpage.c
+++ b/arch/s390/mm/hugetlbpage.c
@@ -188,8 +188,8 @@ pte_t huge_ptep_get(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
return __rste_to_pte(pte_val(*ptep));
}
-pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
- unsigned long addr, pte_t *ptep)
+pte_t __huge_ptep_get_and_clear(struct mm_struct *mm,
+ unsigned long addr, pte_t *ptep)
{
pte_t pte = huge_ptep_get(mm, addr, ptep);
pmd_t *pmdp = (pmd_t *) ptep;
@@ -202,6 +202,12 @@ pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
return pte;
}
+pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
+ unsigned long addr, pte_t *ptep, unsigned long sz)
+{
+ return __huge_ptep_get_and_clear(mm, addr, ptep);
+}
+
pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma,
unsigned long addr, unsigned long sz)
{
diff --git a/arch/sparc/include/asm/hugetlb.h b/arch/sparc/include/asm/hugetlb.h
index c714ca6a05aa..e7a9cdd498dc 100644
--- a/arch/sparc/include/asm/hugetlb.h
+++ b/arch/sparc/include/asm/hugetlb.h
@@ -20,7 +20,7 @@ void __set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
#define __HAVE_ARCH_HUGE_PTEP_GET_AND_CLEAR
pte_t huge_ptep_get_and_clear(struct mm_struct *mm, unsigned long addr,
- pte_t *ptep);
+ pte_t *ptep, unsigned long sz);
#define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH
static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
diff --git a/arch/sparc/mm/hugetlbpage.c b/arch/sparc/mm/hugetlbpage.c
index eee601a0d2cf..80504148d8a5 100644
--- a/arch/sparc/mm/hugetlbpage.c
+++ b/arch/sparc/mm/hugetlbpage.c
@@ -260,7 +260,7 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
}
pte_t huge_ptep_get_and_clear(struct mm_struct *mm, unsigned long addr,
- pte_t *ptep)
+ pte_t *ptep, unsigned long sz)
{
unsigned int i, nptes, orig_shift, shift;
unsigned long size;
diff --git a/include/asm-generic/hugetlb.h b/include/asm-generic/hugetlb.h
index f42133dae68e..2afc95bf1655 100644
--- a/include/asm-generic/hugetlb.h
+++ b/include/asm-generic/hugetlb.h
@@ -90,7 +90,7 @@ static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
#ifndef __HAVE_ARCH_HUGE_PTEP_GET_AND_CLEAR
static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
- unsigned long addr, pte_t *ptep)
+ unsigned long addr, pte_t *ptep, unsigned long sz)
{
return ptep_get_and_clear(mm, addr, ptep);
}
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index ec8c0ccc8f95..bf5f7256bd28 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -1004,7 +1004,9 @@ static inline void hugetlb_count_sub(long l, struct mm_struct *mm)
static inline pte_t huge_ptep_modify_prot_start(struct vm_area_struct *vma,
unsigned long addr, pte_t *ptep)
{
- return huge_ptep_get_and_clear(vma->vm_mm, addr, ptep);
+ unsigned long psize = huge_page_size(hstate_vma(vma));
+
+ return huge_ptep_get_and_clear(vma->vm_mm, addr, ptep, psize);
}
#endif
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 65068671e460..de9d49e521c1 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5447,7 +5447,7 @@ static void move_huge_pte(struct vm_area_struct *vma, unsigned long old_addr,
if (src_ptl != dst_ptl)
spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING);
- pte = huge_ptep_get_and_clear(mm, old_addr, src_pte);
+ pte = huge_ptep_get_and_clear(mm, old_addr, src_pte, sz);
if (need_clear_uffd_wp && pte_marker_uffd_wp(pte))
huge_pte_clear(mm, new_addr, dst_pte, sz);
@@ -5622,7 +5622,7 @@ void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
set_vma_resv_flags(vma, HPAGE_RESV_UNMAPPED);
}
- pte = huge_ptep_get_and_clear(mm, address, ptep);
+ pte = huge_ptep_get_and_clear(mm, address, ptep, sz);
tlb_remove_huge_tlb_entry(h, tlb, ptep, address);
if (huge_pte_dirty(pte))
set_page_dirty(page);
--
2.43.0
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v2 2/4] arm64: hugetlb: Fix huge_ptep_get_and_clear() for non-present ptes
2025-02-17 14:04 [PATCH v2 0/4] Fixes for hugetlb and vmalloc on arm64 Ryan Roberts
2025-02-17 14:04 ` [PATCH v2 1/4] mm: hugetlb: Add huge page size param to huge_ptep_get_and_clear() Ryan Roberts
@ 2025-02-17 14:04 ` Ryan Roberts
2025-02-19 8:45 ` Anshuman Khandual
` (3 more replies)
2025-02-17 14:04 ` [PATCH v2 3/4] arm64: hugetlb: Fix flush_hugetlb_tlb_range() invalidation level Ryan Roberts
` (2 subsequent siblings)
4 siblings, 4 replies; 30+ messages in thread
From: Ryan Roberts @ 2025-02-17 14:04 UTC (permalink / raw)
To: Catalin Marinas, Will Deacon, Huacai Chen, WANG Xuerui,
Thomas Bogendoerfer, James E.J. Bottomley, Helge Deller,
Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
Christophe Leroy, Naveen N Rao, Paul Walmsley, Palmer Dabbelt,
Albert Ou, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, Gerald Schaefer,
David S. Miller, Andreas Larsson, Arnd Bergmann, Muchun Song,
Andrew Morton, Uladzislau Rezki, Christoph Hellwig,
David Hildenbrand, Matthew Wilcox (Oracle),
Mark Rutland, Anshuman Khandual, Dev Jain, Kevin Brodsky,
Alexandre Ghiti
Cc: Ryan Roberts, linux-arm-kernel, linux-mm, linux-kernel, stable
arm64 supports multiple huge_pte sizes. Some of the sizes are covered by
a single pte entry at a particular level (PMD_SIZE, PUD_SIZE), and some
are covered by multiple ptes at a particular level (CONT_PTE_SIZE,
CONT_PMD_SIZE). So the function has to figure out the size from the
huge_pte pointer. This was previously done by walking the pgtable to
determine the level and by using the PTE_CONT bit to determine the
number of ptes at the level.
But the PTE_CONT bit is only valid when the pte is present. For
non-present pte values (e.g. markers, migration entries), the previous
implementation was therefore erroniously determining the size. There is
at least one known caller in core-mm, move_huge_pte(), which may call
huge_ptep_get_and_clear() for a non-present pte. So we must be robust to
this case. Additionally the "regular" ptep_get_and_clear() is robust to
being called for non-present ptes so it makes sense to follow the
behaviour.
Fix this by using the new sz parameter which is now provided to the
function. Additionally when clearing each pte in a contig range, don't
gather the access and dirty bits if the pte is not present.
An alternative approach that would not require API changes would be to
store the PTE_CONT bit in a spare bit in the swap entry pte for the
non-present case. But it felt cleaner to follow other APIs' lead and
just pass in the size.
As an aside, PTE_CONT is bit 52, which corresponds to bit 40 in the swap
entry offset field (layout of non-present pte). Since hugetlb is never
swapped to disk, this field will only be populated for markers, which
always set this bit to 0 and hwpoison swap entries, which set the offset
field to a PFN; So it would only ever be 1 for a 52-bit PVA system where
memory in that high half was poisoned (I think!). So in practice, this
bit would almost always be zero for non-present ptes and we would only
clear the first entry if it was actually a contiguous block. That's
probably a less severe symptom than if it was always interpretted as 1
and cleared out potentially-present neighboring PTEs.
Cc: stable@vger.kernel.org
Fixes: 66b3923a1a0f ("arm64: hugetlb: add support for PTE contiguous bit")
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
arch/arm64/mm/hugetlbpage.c | 40 ++++++++++++++++---------------------
1 file changed, 17 insertions(+), 23 deletions(-)
diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index 06db4649af91..614b2feddba2 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -163,24 +163,23 @@ static pte_t get_clear_contig(struct mm_struct *mm,
unsigned long pgsize,
unsigned long ncontig)
{
- pte_t orig_pte = __ptep_get(ptep);
- unsigned long i;
-
- for (i = 0; i < ncontig; i++, addr += pgsize, ptep++) {
- pte_t pte = __ptep_get_and_clear(mm, addr, ptep);
-
- /*
- * If HW_AFDBM is enabled, then the HW could turn on
- * the dirty or accessed bit for any page in the set,
- * so check them all.
- */
- if (pte_dirty(pte))
- orig_pte = pte_mkdirty(orig_pte);
-
- if (pte_young(pte))
- orig_pte = pte_mkyoung(orig_pte);
+ pte_t pte, tmp_pte;
+ bool present;
+
+ pte = __ptep_get_and_clear(mm, addr, ptep);
+ present = pte_present(pte);
+ while (--ncontig) {
+ ptep++;
+ addr += pgsize;
+ tmp_pte = __ptep_get_and_clear(mm, addr, ptep);
+ if (present) {
+ if (pte_dirty(tmp_pte))
+ pte = pte_mkdirty(pte);
+ if (pte_young(tmp_pte))
+ pte = pte_mkyoung(pte);
+ }
}
- return orig_pte;
+ return pte;
}
static pte_t get_clear_contig_flush(struct mm_struct *mm,
@@ -401,13 +400,8 @@ pte_t huge_ptep_get_and_clear(struct mm_struct *mm, unsigned long addr,
{
int ncontig;
size_t pgsize;
- pte_t orig_pte = __ptep_get(ptep);
-
- if (!pte_cont(orig_pte))
- return __ptep_get_and_clear(mm, addr, ptep);
-
- ncontig = find_num_contig(mm, addr, ptep, &pgsize);
+ ncontig = num_contig_ptes(sz, &pgsize);
return get_clear_contig(mm, addr, ptep, pgsize, ncontig);
}
--
2.43.0
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v2 3/4] arm64: hugetlb: Fix flush_hugetlb_tlb_range() invalidation level
2025-02-17 14:04 [PATCH v2 0/4] Fixes for hugetlb and vmalloc on arm64 Ryan Roberts
2025-02-17 14:04 ` [PATCH v2 1/4] mm: hugetlb: Add huge page size param to huge_ptep_get_and_clear() Ryan Roberts
2025-02-17 14:04 ` [PATCH v2 2/4] arm64: hugetlb: Fix huge_ptep_get_and_clear() for non-present ptes Ryan Roberts
@ 2025-02-17 14:04 ` Ryan Roberts
2025-02-19 8:56 ` Anshuman Khandual
2025-02-19 19:04 ` Catalin Marinas
2025-02-17 14:04 ` [PATCH v2 4/4] mm: Don't skip arch_sync_kernel_mappings() in error paths Ryan Roberts
2025-02-19 19:10 ` [PATCH v2 0/4] Fixes for hugetlb and vmalloc on arm64 Catalin Marinas
4 siblings, 2 replies; 30+ messages in thread
From: Ryan Roberts @ 2025-02-17 14:04 UTC (permalink / raw)
To: Catalin Marinas, Will Deacon, Huacai Chen, WANG Xuerui,
Thomas Bogendoerfer, James E.J. Bottomley, Helge Deller,
Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
Christophe Leroy, Naveen N Rao, Paul Walmsley, Palmer Dabbelt,
Albert Ou, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, Gerald Schaefer,
David S. Miller, Andreas Larsson, Arnd Bergmann, Muchun Song,
Andrew Morton, Uladzislau Rezki, Christoph Hellwig,
David Hildenbrand, Matthew Wilcox (Oracle),
Mark Rutland, Anshuman Khandual, Dev Jain, Kevin Brodsky,
Alexandre Ghiti
Cc: Ryan Roberts, linux-arm-kernel, linux-mm, linux-kernel, stable
commit c910f2b65518 ("arm64/mm: Update tlb invalidation routines for
FEAT_LPA2") changed the "invalidation level unknown" hint from 0 to
TLBI_TTL_UNKNOWN (INT_MAX). But the fallback "unknown level" path in
flush_hugetlb_tlb_range() was not updated. So as it stands, when trying
to invalidate CONT_PMD_SIZE or CONT_PTE_SIZE hugetlb mappings, we will
spuriously try to invalidate at level 0 on LPA2-enabled systems.
Fix this so that the fallback passes TLBI_TTL_UNKNOWN, and while we are
at it, explicitly use the correct stride and level for CONT_PMD_SIZE and
CONT_PTE_SIZE, which should provide a minor optimization.
Cc: stable@vger.kernel.org
Fixes: c910f2b65518 ("arm64/mm: Update tlb invalidation routines for FEAT_LPA2")
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
arch/arm64/include/asm/hugetlb.h | 22 ++++++++++++++++------
1 file changed, 16 insertions(+), 6 deletions(-)
diff --git a/arch/arm64/include/asm/hugetlb.h b/arch/arm64/include/asm/hugetlb.h
index 03db9cb21ace..07fbf5bf85a7 100644
--- a/arch/arm64/include/asm/hugetlb.h
+++ b/arch/arm64/include/asm/hugetlb.h
@@ -76,12 +76,22 @@ static inline void flush_hugetlb_tlb_range(struct vm_area_struct *vma,
{
unsigned long stride = huge_page_size(hstate_vma(vma));
- if (stride == PMD_SIZE)
- __flush_tlb_range(vma, start, end, stride, false, 2);
- else if (stride == PUD_SIZE)
- __flush_tlb_range(vma, start, end, stride, false, 1);
- else
- __flush_tlb_range(vma, start, end, PAGE_SIZE, false, 0);
+ switch (stride) {
+#ifndef __PAGETABLE_PMD_FOLDED
+ case PUD_SIZE:
+ __flush_tlb_range(vma, start, end, PUD_SIZE, false, 1);
+ break;
+#endif
+ case CONT_PMD_SIZE:
+ case PMD_SIZE:
+ __flush_tlb_range(vma, start, end, PMD_SIZE, false, 2);
+ break;
+ case CONT_PTE_SIZE:
+ __flush_tlb_range(vma, start, end, PAGE_SIZE, false, 3);
+ break;
+ default:
+ __flush_tlb_range(vma, start, end, PAGE_SIZE, false, TLBI_TTL_UNKNOWN);
+ }
}
#endif /* __ASM_HUGETLB_H */
--
2.43.0
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v2 4/4] mm: Don't skip arch_sync_kernel_mappings() in error paths
2025-02-17 14:04 [PATCH v2 0/4] Fixes for hugetlb and vmalloc on arm64 Ryan Roberts
` (2 preceding siblings ...)
2025-02-17 14:04 ` [PATCH v2 3/4] arm64: hugetlb: Fix flush_hugetlb_tlb_range() invalidation level Ryan Roberts
@ 2025-02-17 14:04 ` Ryan Roberts
2025-02-19 19:07 ` Catalin Marinas
2025-02-19 19:10 ` [PATCH v2 0/4] Fixes for hugetlb and vmalloc on arm64 Catalin Marinas
4 siblings, 1 reply; 30+ messages in thread
From: Ryan Roberts @ 2025-02-17 14:04 UTC (permalink / raw)
To: Catalin Marinas, Will Deacon, Huacai Chen, WANG Xuerui,
Thomas Bogendoerfer, James E.J. Bottomley, Helge Deller,
Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
Christophe Leroy, Naveen N Rao, Paul Walmsley, Palmer Dabbelt,
Albert Ou, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, Gerald Schaefer,
David S. Miller, Andreas Larsson, Arnd Bergmann, Muchun Song,
Andrew Morton, Uladzislau Rezki, Christoph Hellwig,
David Hildenbrand, Matthew Wilcox (Oracle),
Mark Rutland, Anshuman Khandual, Dev Jain, Kevin Brodsky,
Alexandre Ghiti
Cc: Ryan Roberts, linux-arm-kernel, linux-mm, linux-kernel, stable
Fix callers that previously skipped calling arch_sync_kernel_mappings()
if an error occurred during a pgtable update. The call is still required
to sync any pgtable updates that may have occurred prior to hitting the
error condition.
These are theoretical bugs discovered during code review.
Cc: stable@vger.kernel.org
Fixes: 2ba3e6947aed ("mm/vmalloc: track which page-table levels were modified")
Fixes: 0c95cba49255 ("mm: apply_to_pte_range warn and fail if a large pte is encountered")
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
mm/memory.c | 6 ++++--
mm/vmalloc.c | 4 ++--
2 files changed, 6 insertions(+), 4 deletions(-)
diff --git a/mm/memory.c b/mm/memory.c
index 539c0f7c6d54..a15f7dd500ea 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3040,8 +3040,10 @@ static int __apply_to_page_range(struct mm_struct *mm, unsigned long addr,
next = pgd_addr_end(addr, end);
if (pgd_none(*pgd) && !create)
continue;
- if (WARN_ON_ONCE(pgd_leaf(*pgd)))
- return -EINVAL;
+ if (WARN_ON_ONCE(pgd_leaf(*pgd))) {
+ err = -EINVAL;
+ break;
+ }
if (!pgd_none(*pgd) && WARN_ON_ONCE(pgd_bad(*pgd))) {
if (!create)
continue;
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index a6e7acebe9ad..61981ee1c9d2 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -586,13 +586,13 @@ static int vmap_small_pages_range_noflush(unsigned long addr, unsigned long end,
mask |= PGTBL_PGD_MODIFIED;
err = vmap_pages_p4d_range(pgd, addr, next, prot, pages, &nr, &mask);
if (err)
- return err;
+ break;
} while (pgd++, addr = next, addr != end);
if (mask & ARCH_PAGE_TABLE_SYNC_MASK)
arch_sync_kernel_mappings(start, end);
- return 0;
+ return err;
}
/*
--
2.43.0
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v2 1/4] mm: hugetlb: Add huge page size param to huge_ptep_get_and_clear()
2025-02-17 14:04 ` [PATCH v2 1/4] mm: hugetlb: Add huge page size param to huge_ptep_get_and_clear() Ryan Roberts
@ 2025-02-19 8:28 ` Anshuman Khandual
2025-02-19 19:00 ` Catalin Marinas
` (3 subsequent siblings)
4 siblings, 0 replies; 30+ messages in thread
From: Anshuman Khandual @ 2025-02-19 8:28 UTC (permalink / raw)
To: Ryan Roberts, Catalin Marinas, Will Deacon, Huacai Chen,
WANG Xuerui, Thomas Bogendoerfer, James E.J. Bottomley,
Helge Deller, Madhavan Srinivasan, Michael Ellerman,
Nicholas Piggin, Christophe Leroy, Naveen N Rao, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Gerald Schaefer, David S. Miller, Andreas Larsson, Arnd Bergmann,
Muchun Song, Andrew Morton, Uladzislau Rezki, Christoph Hellwig,
David Hildenbrand, Matthew Wilcox (Oracle),
Mark Rutland, Dev Jain, Kevin Brodsky, Alexandre Ghiti
Cc: linux-arm-kernel, linux-mm, linux-kernel, stable
On 2/17/25 19:34, Ryan Roberts wrote:
> In order to fix a bug, arm64 needs to be told the size of the huge page
> for which the huge_pte is being set in huge_ptep_get_and_clear().
> Provide for this by adding an `unsigned long sz` parameter to the
> function. This follows the same pattern as huge_pte_clear() and
> set_huge_pte_at().
>
> This commit makes the required interface modifications to the core mm as
> well as all arches that implement this function (arm64, loongarch, mips,
> parisc, powerpc, riscv, s390, sparc). The actual arm64 bug will be fixed
> in a separate commit.
>
> Cc: stable@vger.kernel.org
> Fixes: 66b3923a1a0f ("arm64: hugetlb: add support for PTE contiguous bit")
> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
LGTM
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
> ---
> arch/arm64/include/asm/hugetlb.h | 4 ++--
> arch/arm64/mm/hugetlbpage.c | 8 +++++---
> arch/loongarch/include/asm/hugetlb.h | 6 ++++--
> arch/mips/include/asm/hugetlb.h | 6 ++++--
> arch/parisc/include/asm/hugetlb.h | 2 +-
> arch/parisc/mm/hugetlbpage.c | 2 +-
> arch/powerpc/include/asm/hugetlb.h | 6 ++++--
> arch/riscv/include/asm/hugetlb.h | 3 ++-
> arch/riscv/mm/hugetlbpage.c | 2 +-
> arch/s390/include/asm/hugetlb.h | 12 ++++++++----
> arch/s390/mm/hugetlbpage.c | 10 ++++++++--
> arch/sparc/include/asm/hugetlb.h | 2 +-
> arch/sparc/mm/hugetlbpage.c | 2 +-
> include/asm-generic/hugetlb.h | 2 +-
> include/linux/hugetlb.h | 4 +++-
> mm/hugetlb.c | 4 ++--
> 16 files changed, 48 insertions(+), 27 deletions(-)
>
> diff --git a/arch/arm64/include/asm/hugetlb.h b/arch/arm64/include/asm/hugetlb.h
> index c6dff3e69539..03db9cb21ace 100644
> --- a/arch/arm64/include/asm/hugetlb.h
> +++ b/arch/arm64/include/asm/hugetlb.h
> @@ -42,8 +42,8 @@ extern int huge_ptep_set_access_flags(struct vm_area_struct *vma,
> unsigned long addr, pte_t *ptep,
> pte_t pte, int dirty);
> #define __HAVE_ARCH_HUGE_PTEP_GET_AND_CLEAR
> -extern pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
> - unsigned long addr, pte_t *ptep);
> +extern pte_t huge_ptep_get_and_clear(struct mm_struct *mm, unsigned long addr,
> + pte_t *ptep, unsigned long sz);
> #define __HAVE_ARCH_HUGE_PTEP_SET_WRPROTECT
> extern void huge_ptep_set_wrprotect(struct mm_struct *mm,
> unsigned long addr, pte_t *ptep);
> diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
> index 98a2a0e64e25..06db4649af91 100644
> --- a/arch/arm64/mm/hugetlbpage.c
> +++ b/arch/arm64/mm/hugetlbpage.c
> @@ -396,8 +396,8 @@ void huge_pte_clear(struct mm_struct *mm, unsigned long addr,
> __pte_clear(mm, addr, ptep);
> }
>
> -pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
> - unsigned long addr, pte_t *ptep)
> +pte_t huge_ptep_get_and_clear(struct mm_struct *mm, unsigned long addr,
> + pte_t *ptep, unsigned long sz)
> {
> int ncontig;
> size_t pgsize;
> @@ -549,6 +549,8 @@ bool __init arch_hugetlb_valid_size(unsigned long size)
>
> pte_t huge_ptep_modify_prot_start(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep)
> {
> + unsigned long psize = huge_page_size(hstate_vma(vma));
> +
> if (alternative_has_cap_unlikely(ARM64_WORKAROUND_2645198)) {
> /*
> * Break-before-make (BBM) is required for all user space mappings
> @@ -558,7 +560,7 @@ pte_t huge_ptep_modify_prot_start(struct vm_area_struct *vma, unsigned long addr
> if (pte_user_exec(__ptep_get(ptep)))
> return huge_ptep_clear_flush(vma, addr, ptep);
> }
> - return huge_ptep_get_and_clear(vma->vm_mm, addr, ptep);
> + return huge_ptep_get_and_clear(vma->vm_mm, addr, ptep, psize);
> }
>
> void huge_ptep_modify_prot_commit(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep,
> diff --git a/arch/loongarch/include/asm/hugetlb.h b/arch/loongarch/include/asm/hugetlb.h
> index c8e4057734d0..4dc4b3e04225 100644
> --- a/arch/loongarch/include/asm/hugetlb.h
> +++ b/arch/loongarch/include/asm/hugetlb.h
> @@ -36,7 +36,8 @@ static inline void huge_pte_clear(struct mm_struct *mm, unsigned long addr,
>
> #define __HAVE_ARCH_HUGE_PTEP_GET_AND_CLEAR
> static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
> - unsigned long addr, pte_t *ptep)
> + unsigned long addr, pte_t *ptep,
> + unsigned long sz)
> {
> pte_t clear;
> pte_t pte = ptep_get(ptep);
> @@ -51,8 +52,9 @@ static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
> unsigned long addr, pte_t *ptep)
> {
> pte_t pte;
> + unsigned long sz = huge_page_size(hstate_vma(vma));
>
> - pte = huge_ptep_get_and_clear(vma->vm_mm, addr, ptep);
> + pte = huge_ptep_get_and_clear(vma->vm_mm, addr, ptep, sz);
> flush_tlb_page(vma, addr);
> return pte;
> }
> diff --git a/arch/mips/include/asm/hugetlb.h b/arch/mips/include/asm/hugetlb.h
> index d0a86ce83de9..fbc71ddcf0f6 100644
> --- a/arch/mips/include/asm/hugetlb.h
> +++ b/arch/mips/include/asm/hugetlb.h
> @@ -27,7 +27,8 @@ static inline int prepare_hugepage_range(struct file *file,
>
> #define __HAVE_ARCH_HUGE_PTEP_GET_AND_CLEAR
> static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
> - unsigned long addr, pte_t *ptep)
> + unsigned long addr, pte_t *ptep,
> + unsigned long sz)
> {
> pte_t clear;
> pte_t pte = *ptep;
> @@ -42,13 +43,14 @@ static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
> unsigned long addr, pte_t *ptep)
> {
> pte_t pte;
> + unsigned long sz = huge_page_size(hstate_vma(vma));
>
> /*
> * clear the huge pte entry firstly, so that the other smp threads will
> * not get old pte entry after finishing flush_tlb_page and before
> * setting new huge pte entry
> */
> - pte = huge_ptep_get_and_clear(vma->vm_mm, addr, ptep);
> + pte = huge_ptep_get_and_clear(vma->vm_mm, addr, ptep, sz);
> flush_tlb_page(vma, addr);
> return pte;
> }
> diff --git a/arch/parisc/include/asm/hugetlb.h b/arch/parisc/include/asm/hugetlb.h
> index 5b3a5429f71b..21e9ace17739 100644
> --- a/arch/parisc/include/asm/hugetlb.h
> +++ b/arch/parisc/include/asm/hugetlb.h
> @@ -10,7 +10,7 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
>
> #define __HAVE_ARCH_HUGE_PTEP_GET_AND_CLEAR
> pte_t huge_ptep_get_and_clear(struct mm_struct *mm, unsigned long addr,
> - pte_t *ptep);
> + pte_t *ptep, unsigned long sz);
>
> #define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH
> static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
> diff --git a/arch/parisc/mm/hugetlbpage.c b/arch/parisc/mm/hugetlbpage.c
> index e9d18cf25b79..a94fe546d434 100644
> --- a/arch/parisc/mm/hugetlbpage.c
> +++ b/arch/parisc/mm/hugetlbpage.c
> @@ -126,7 +126,7 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
>
>
> pte_t huge_ptep_get_and_clear(struct mm_struct *mm, unsigned long addr,
> - pte_t *ptep)
> + pte_t *ptep, unsigned long sz)
> {
> pte_t entry;
>
> diff --git a/arch/powerpc/include/asm/hugetlb.h b/arch/powerpc/include/asm/hugetlb.h
> index dad2e7980f24..86326587e58d 100644
> --- a/arch/powerpc/include/asm/hugetlb.h
> +++ b/arch/powerpc/include/asm/hugetlb.h
> @@ -45,7 +45,8 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
>
> #define __HAVE_ARCH_HUGE_PTEP_GET_AND_CLEAR
> static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
> - unsigned long addr, pte_t *ptep)
> + unsigned long addr, pte_t *ptep,
> + unsigned long sz)
> {
> return __pte(pte_update(mm, addr, ptep, ~0UL, 0, 1));
> }
> @@ -55,8 +56,9 @@ static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
> unsigned long addr, pte_t *ptep)
> {
> pte_t pte;
> + unsigned long sz = huge_page_size(hstate_vma(vma));
>
> - pte = huge_ptep_get_and_clear(vma->vm_mm, addr, ptep);
> + pte = huge_ptep_get_and_clear(vma->vm_mm, addr, ptep, sz);
> flush_hugetlb_page(vma, addr);
> return pte;
> }
> diff --git a/arch/riscv/include/asm/hugetlb.h b/arch/riscv/include/asm/hugetlb.h
> index faf3624d8057..446126497768 100644
> --- a/arch/riscv/include/asm/hugetlb.h
> +++ b/arch/riscv/include/asm/hugetlb.h
> @@ -28,7 +28,8 @@ void set_huge_pte_at(struct mm_struct *mm,
>
> #define __HAVE_ARCH_HUGE_PTEP_GET_AND_CLEAR
> pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
> - unsigned long addr, pte_t *ptep);
> + unsigned long addr, pte_t *ptep,
> + unsigned long sz);
>
> #define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH
> pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
> diff --git a/arch/riscv/mm/hugetlbpage.c b/arch/riscv/mm/hugetlbpage.c
> index 42314f093922..b4a78a4b35cf 100644
> --- a/arch/riscv/mm/hugetlbpage.c
> +++ b/arch/riscv/mm/hugetlbpage.c
> @@ -293,7 +293,7 @@ int huge_ptep_set_access_flags(struct vm_area_struct *vma,
>
> pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
> unsigned long addr,
> - pte_t *ptep)
> + pte_t *ptep, unsigned long sz)
> {
> pte_t orig_pte = ptep_get(ptep);
> int pte_num;
> diff --git a/arch/s390/include/asm/hugetlb.h b/arch/s390/include/asm/hugetlb.h
> index 7c52acaf9f82..420c74306779 100644
> --- a/arch/s390/include/asm/hugetlb.h
> +++ b/arch/s390/include/asm/hugetlb.h
> @@ -26,7 +26,11 @@ void __set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
> pte_t huge_ptep_get(struct mm_struct *mm, unsigned long addr, pte_t *ptep);
>
> #define __HAVE_ARCH_HUGE_PTEP_GET_AND_CLEAR
> -pte_t huge_ptep_get_and_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep);
> +pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
> + unsigned long addr, pte_t *ptep,
> + unsigned long sz);
> +pte_t __huge_ptep_get_and_clear(struct mm_struct *mm,
> + unsigned long addr, pte_t *ptep);
>
> static inline void arch_clear_hugetlb_flags(struct folio *folio)
> {
> @@ -48,7 +52,7 @@ static inline void huge_pte_clear(struct mm_struct *mm, unsigned long addr,
> static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
> unsigned long address, pte_t *ptep)
> {
> - return huge_ptep_get_and_clear(vma->vm_mm, address, ptep);
> + return __huge_ptep_get_and_clear(vma->vm_mm, address, ptep);
> }
>
> #define __HAVE_ARCH_HUGE_PTEP_SET_ACCESS_FLAGS
> @@ -59,7 +63,7 @@ static inline int huge_ptep_set_access_flags(struct vm_area_struct *vma,
> int changed = !pte_same(huge_ptep_get(vma->vm_mm, addr, ptep), pte);
>
> if (changed) {
> - huge_ptep_get_and_clear(vma->vm_mm, addr, ptep);
> + __huge_ptep_get_and_clear(vma->vm_mm, addr, ptep);
> __set_huge_pte_at(vma->vm_mm, addr, ptep, pte);
> }
> return changed;
> @@ -69,7 +73,7 @@ static inline int huge_ptep_set_access_flags(struct vm_area_struct *vma,
> static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,
> unsigned long addr, pte_t *ptep)
> {
> - pte_t pte = huge_ptep_get_and_clear(mm, addr, ptep);
> + pte_t pte = __huge_ptep_get_and_clear(mm, addr, ptep);
>
> __set_huge_pte_at(mm, addr, ptep, pte_wrprotect(pte));
> }
> diff --git a/arch/s390/mm/hugetlbpage.c b/arch/s390/mm/hugetlbpage.c
> index d9ce199953de..52ee8e854195 100644
> --- a/arch/s390/mm/hugetlbpage.c
> +++ b/arch/s390/mm/hugetlbpage.c
> @@ -188,8 +188,8 @@ pte_t huge_ptep_get(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
> return __rste_to_pte(pte_val(*ptep));
> }
>
> -pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
> - unsigned long addr, pte_t *ptep)
> +pte_t __huge_ptep_get_and_clear(struct mm_struct *mm,
> + unsigned long addr, pte_t *ptep)
> {
> pte_t pte = huge_ptep_get(mm, addr, ptep);
> pmd_t *pmdp = (pmd_t *) ptep;
> @@ -202,6 +202,12 @@ pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
> return pte;
> }
>
> +pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
> + unsigned long addr, pte_t *ptep, unsigned long sz)
> +{
> + return __huge_ptep_get_and_clear(mm, addr, ptep);
> +}
> +
> pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma,
> unsigned long addr, unsigned long sz)
> {
> diff --git a/arch/sparc/include/asm/hugetlb.h b/arch/sparc/include/asm/hugetlb.h
> index c714ca6a05aa..e7a9cdd498dc 100644
> --- a/arch/sparc/include/asm/hugetlb.h
> +++ b/arch/sparc/include/asm/hugetlb.h
> @@ -20,7 +20,7 @@ void __set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
>
> #define __HAVE_ARCH_HUGE_PTEP_GET_AND_CLEAR
> pte_t huge_ptep_get_and_clear(struct mm_struct *mm, unsigned long addr,
> - pte_t *ptep);
> + pte_t *ptep, unsigned long sz);
>
> #define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH
> static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
> diff --git a/arch/sparc/mm/hugetlbpage.c b/arch/sparc/mm/hugetlbpage.c
> index eee601a0d2cf..80504148d8a5 100644
> --- a/arch/sparc/mm/hugetlbpage.c
> +++ b/arch/sparc/mm/hugetlbpage.c
> @@ -260,7 +260,7 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
> }
>
> pte_t huge_ptep_get_and_clear(struct mm_struct *mm, unsigned long addr,
> - pte_t *ptep)
> + pte_t *ptep, unsigned long sz)
> {
> unsigned int i, nptes, orig_shift, shift;
> unsigned long size;
> diff --git a/include/asm-generic/hugetlb.h b/include/asm-generic/hugetlb.h
> index f42133dae68e..2afc95bf1655 100644
> --- a/include/asm-generic/hugetlb.h
> +++ b/include/asm-generic/hugetlb.h
> @@ -90,7 +90,7 @@ static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
>
> #ifndef __HAVE_ARCH_HUGE_PTEP_GET_AND_CLEAR
> static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
> - unsigned long addr, pte_t *ptep)
> + unsigned long addr, pte_t *ptep, unsigned long sz)
> {
> return ptep_get_and_clear(mm, addr, ptep);
> }
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index ec8c0ccc8f95..bf5f7256bd28 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -1004,7 +1004,9 @@ static inline void hugetlb_count_sub(long l, struct mm_struct *mm)
> static inline pte_t huge_ptep_modify_prot_start(struct vm_area_struct *vma,
> unsigned long addr, pte_t *ptep)
> {
> - return huge_ptep_get_and_clear(vma->vm_mm, addr, ptep);
> + unsigned long psize = huge_page_size(hstate_vma(vma));
> +
> + return huge_ptep_get_and_clear(vma->vm_mm, addr, ptep, psize);
> }
> #endif
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 65068671e460..de9d49e521c1 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -5447,7 +5447,7 @@ static void move_huge_pte(struct vm_area_struct *vma, unsigned long old_addr,
> if (src_ptl != dst_ptl)
> spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING);
>
> - pte = huge_ptep_get_and_clear(mm, old_addr, src_pte);
> + pte = huge_ptep_get_and_clear(mm, old_addr, src_pte, sz);
>
> if (need_clear_uffd_wp && pte_marker_uffd_wp(pte))
> huge_pte_clear(mm, new_addr, dst_pte, sz);
> @@ -5622,7 +5622,7 @@ void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
> set_vma_resv_flags(vma, HPAGE_RESV_UNMAPPED);
> }
>
> - pte = huge_ptep_get_and_clear(mm, address, ptep);
> + pte = huge_ptep_get_and_clear(mm, address, ptep, sz);
> tlb_remove_huge_tlb_entry(h, tlb, ptep, address);
> if (huge_pte_dirty(pte))
> set_page_dirty(page);
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v2 2/4] arm64: hugetlb: Fix huge_ptep_get_and_clear() for non-present ptes
2025-02-17 14:04 ` [PATCH v2 2/4] arm64: hugetlb: Fix huge_ptep_get_and_clear() for non-present ptes Ryan Roberts
@ 2025-02-19 8:45 ` Anshuman Khandual
2025-02-19 8:58 ` Ryan Roberts
2025-02-19 19:03 ` Catalin Marinas
` (2 subsequent siblings)
3 siblings, 1 reply; 30+ messages in thread
From: Anshuman Khandual @ 2025-02-19 8:45 UTC (permalink / raw)
To: Ryan Roberts, Catalin Marinas, Will Deacon, Huacai Chen,
WANG Xuerui, Thomas Bogendoerfer, James E.J. Bottomley,
Helge Deller, Madhavan Srinivasan, Michael Ellerman,
Nicholas Piggin, Christophe Leroy, Naveen N Rao, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Gerald Schaefer, David S. Miller, Andreas Larsson, Arnd Bergmann,
Muchun Song, Andrew Morton, Uladzislau Rezki, Christoph Hellwig,
David Hildenbrand, Matthew Wilcox (Oracle),
Mark Rutland, Dev Jain, Kevin Brodsky, Alexandre Ghiti
Cc: linux-arm-kernel, linux-mm, linux-kernel, stable
On 2/17/25 19:34, Ryan Roberts wrote:
> arm64 supports multiple huge_pte sizes. Some of the sizes are covered by
> a single pte entry at a particular level (PMD_SIZE, PUD_SIZE), and some
> are covered by multiple ptes at a particular level (CONT_PTE_SIZE,
> CONT_PMD_SIZE). So the function has to figure out the size from the
> huge_pte pointer. This was previously done by walking the pgtable to
> determine the level and by using the PTE_CONT bit to determine the
> number of ptes at the level.
>
> But the PTE_CONT bit is only valid when the pte is present. For
> non-present pte values (e.g. markers, migration entries), the previous
> implementation was therefore erroniously determining the size. There is
> at least one known caller in core-mm, move_huge_pte(), which may call
> huge_ptep_get_and_clear() for a non-present pte. So we must be robust to
> this case. Additionally the "regular" ptep_get_and_clear() is robust to
> being called for non-present ptes so it makes sense to follow the
> behaviour.
>
> Fix this by using the new sz parameter which is now provided to the
> function. Additionally when clearing each pte in a contig range, don't
> gather the access and dirty bits if the pte is not present.
>
> An alternative approach that would not require API changes would be to
> store the PTE_CONT bit in a spare bit in the swap entry pte for the
> non-present case. But it felt cleaner to follow other APIs' lead and
> just pass in the size.
>
> As an aside, PTE_CONT is bit 52, which corresponds to bit 40 in the swap
> entry offset field (layout of non-present pte). Since hugetlb is never
> swapped to disk, this field will only be populated for markers, which
> always set this bit to 0 and hwpoison swap entries, which set the offset
> field to a PFN; So it would only ever be 1 for a 52-bit PVA system where
> memory in that high half was poisoned (I think!). So in practice, this
> bit would almost always be zero for non-present ptes and we would only
> clear the first entry if it was actually a contiguous block. That's
> probably a less severe symptom than if it was always interpretted as 1
> and cleared out potentially-present neighboring PTEs.
>
> Cc: stable@vger.kernel.org
> Fixes: 66b3923a1a0f ("arm64: hugetlb: add support for PTE contiguous bit")
> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> ---
> arch/arm64/mm/hugetlbpage.c | 40 ++++++++++++++++---------------------
> 1 file changed, 17 insertions(+), 23 deletions(-)
>
> diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
> index 06db4649af91..614b2feddba2 100644
> --- a/arch/arm64/mm/hugetlbpage.c
> +++ b/arch/arm64/mm/hugetlbpage.c
> @@ -163,24 +163,23 @@ static pte_t get_clear_contig(struct mm_struct *mm,
> unsigned long pgsize,
> unsigned long ncontig)
> {
> - pte_t orig_pte = __ptep_get(ptep);
> - unsigned long i;
> -
> - for (i = 0; i < ncontig; i++, addr += pgsize, ptep++) {
> - pte_t pte = __ptep_get_and_clear(mm, addr, ptep);
> -
> - /*
> - * If HW_AFDBM is enabled, then the HW could turn on
> - * the dirty or accessed bit for any page in the set,
> - * so check them all.
> - */
> - if (pte_dirty(pte))
> - orig_pte = pte_mkdirty(orig_pte);
> -
> - if (pte_young(pte))
> - orig_pte = pte_mkyoung(orig_pte);
> + pte_t pte, tmp_pte;
> + bool present;
> +
> + pte = __ptep_get_and_clear(mm, addr, ptep);
> + present = pte_present(pte);
pte_present() may not be evaluated for standard huge pages at [PMD|PUD]_SIZE
e.g when ncontig = 1 in the argument.
> + while (--ncontig) {
Should this be converted into a for loop instead just to be in sync with other
similar iterators in this file.
for (i = 1; i < ncontig; i++, addr += pgsize, ptep++)
{
tmp_pte = __ptep_get_and_clear(mm, addr, ptep);
if (present) {
if (pte_dirty(tmp_pte))
pte = pte_mkdirty(pte);
if (pte_young(tmp_pte))
pte = pte_mkyoung(pte);
}
}
> + ptep++;
> + addr += pgsize;
> + tmp_pte = __ptep_get_and_clear(mm, addr, ptep);
> + if (present) {
> + if (pte_dirty(tmp_pte))
> + pte = pte_mkdirty(pte);
> + if (pte_young(tmp_pte))
> + pte = pte_mkyoung(pte);
> + }
> }
> - return orig_pte;
> + return pte;
> }
>
> static pte_t get_clear_contig_flush(struct mm_struct *mm,
> @@ -401,13 +400,8 @@ pte_t huge_ptep_get_and_clear(struct mm_struct *mm, unsigned long addr,
> {
> int ncontig;
> size_t pgsize;
> - pte_t orig_pte = __ptep_get(ptep);
> -
> - if (!pte_cont(orig_pte))
> - return __ptep_get_and_clear(mm, addr, ptep);
> -
> - ncontig = find_num_contig(mm, addr, ptep, &pgsize);
>
> + ncontig = num_contig_ptes(sz, &pgsize);
> return get_clear_contig(mm, addr, ptep, pgsize, ncontig);
> }
>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v2 3/4] arm64: hugetlb: Fix flush_hugetlb_tlb_range() invalidation level
2025-02-17 14:04 ` [PATCH v2 3/4] arm64: hugetlb: Fix flush_hugetlb_tlb_range() invalidation level Ryan Roberts
@ 2025-02-19 8:56 ` Anshuman Khandual
2025-02-19 19:04 ` Catalin Marinas
1 sibling, 0 replies; 30+ messages in thread
From: Anshuman Khandual @ 2025-02-19 8:56 UTC (permalink / raw)
To: Ryan Roberts, Catalin Marinas, Will Deacon, Huacai Chen,
WANG Xuerui, Thomas Bogendoerfer, James E.J. Bottomley,
Helge Deller, Madhavan Srinivasan, Michael Ellerman,
Nicholas Piggin, Christophe Leroy, Naveen N Rao, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Gerald Schaefer, David S. Miller, Andreas Larsson, Arnd Bergmann,
Muchun Song, Andrew Morton, Uladzislau Rezki, Christoph Hellwig,
David Hildenbrand, Matthew Wilcox (Oracle),
Mark Rutland, Dev Jain, Kevin Brodsky, Alexandre Ghiti
Cc: linux-arm-kernel, linux-mm, linux-kernel, stable
On 2/17/25 19:34, Ryan Roberts wrote:
> commit c910f2b65518 ("arm64/mm: Update tlb invalidation routines for
> FEAT_LPA2") changed the "invalidation level unknown" hint from 0 to
> TLBI_TTL_UNKNOWN (INT_MAX). But the fallback "unknown level" path in
> flush_hugetlb_tlb_range() was not updated. So as it stands, when trying
> to invalidate CONT_PMD_SIZE or CONT_PTE_SIZE hugetlb mappings, we will
> spuriously try to invalidate at level 0 on LPA2-enabled systems.
>
> Fix this so that the fallback passes TLBI_TTL_UNKNOWN, and while we are
> at it, explicitly use the correct stride and level for CONT_PMD_SIZE and
> CONT_PTE_SIZE, which should provide a minor optimization.
>
> Cc: stable@vger.kernel.org
> Fixes: c910f2b65518 ("arm64/mm: Update tlb invalidation routines for FEAT_LPA2")
> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
LGTM
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
> ---
> arch/arm64/include/asm/hugetlb.h | 22 ++++++++++++++++------
> 1 file changed, 16 insertions(+), 6 deletions(-)
>
> diff --git a/arch/arm64/include/asm/hugetlb.h b/arch/arm64/include/asm/hugetlb.h
> index 03db9cb21ace..07fbf5bf85a7 100644
> --- a/arch/arm64/include/asm/hugetlb.h
> +++ b/arch/arm64/include/asm/hugetlb.h
> @@ -76,12 +76,22 @@ static inline void flush_hugetlb_tlb_range(struct vm_area_struct *vma,
> {
> unsigned long stride = huge_page_size(hstate_vma(vma));
>
> - if (stride == PMD_SIZE)
> - __flush_tlb_range(vma, start, end, stride, false, 2);
> - else if (stride == PUD_SIZE)
> - __flush_tlb_range(vma, start, end, stride, false, 1);
> - else
> - __flush_tlb_range(vma, start, end, PAGE_SIZE, false, 0);
> + switch (stride) {
> +#ifndef __PAGETABLE_PMD_FOLDED
> + case PUD_SIZE:
> + __flush_tlb_range(vma, start, end, PUD_SIZE, false, 1);
> + break;
> +#endif
> + case CONT_PMD_SIZE:
> + case PMD_SIZE:
> + __flush_tlb_range(vma, start, end, PMD_SIZE, false, 2);
> + break;
> + case CONT_PTE_SIZE:
> + __flush_tlb_range(vma, start, end, PAGE_SIZE, false, 3);
> + break;
> + default:
> + __flush_tlb_range(vma, start, end, PAGE_SIZE, false, TLBI_TTL_UNKNOWN);
> + }
> }
>
> #endif /* __ASM_HUGETLB_H */
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v2 2/4] arm64: hugetlb: Fix huge_ptep_get_and_clear() for non-present ptes
2025-02-19 8:45 ` Anshuman Khandual
@ 2025-02-19 8:58 ` Ryan Roberts
2025-02-20 6:37 ` Anshuman Khandual
0 siblings, 1 reply; 30+ messages in thread
From: Ryan Roberts @ 2025-02-19 8:58 UTC (permalink / raw)
To: Anshuman Khandual, Catalin Marinas, Will Deacon, Huacai Chen,
WANG Xuerui, Thomas Bogendoerfer, James E.J. Bottomley,
Helge Deller, Madhavan Srinivasan, Michael Ellerman,
Nicholas Piggin, Christophe Leroy, Naveen N Rao, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Gerald Schaefer, David S. Miller, Andreas Larsson, Arnd Bergmann,
Muchun Song, Andrew Morton, Uladzislau Rezki, Christoph Hellwig,
David Hildenbrand, Matthew Wilcox (Oracle),
Mark Rutland, Dev Jain, Kevin Brodsky, Alexandre Ghiti
Cc: linux-arm-kernel, linux-mm, linux-kernel, stable
On 19/02/2025 08:45, Anshuman Khandual wrote:
>
>
> On 2/17/25 19:34, Ryan Roberts wrote:
>> arm64 supports multiple huge_pte sizes. Some of the sizes are covered by
>> a single pte entry at a particular level (PMD_SIZE, PUD_SIZE), and some
>> are covered by multiple ptes at a particular level (CONT_PTE_SIZE,
>> CONT_PMD_SIZE). So the function has to figure out the size from the
>> huge_pte pointer. This was previously done by walking the pgtable to
>> determine the level and by using the PTE_CONT bit to determine the
>> number of ptes at the level.
>>
>> But the PTE_CONT bit is only valid when the pte is present. For
>> non-present pte values (e.g. markers, migration entries), the previous
>> implementation was therefore erroniously determining the size. There is
>> at least one known caller in core-mm, move_huge_pte(), which may call
>> huge_ptep_get_and_clear() for a non-present pte. So we must be robust to
>> this case. Additionally the "regular" ptep_get_and_clear() is robust to
>> being called for non-present ptes so it makes sense to follow the
>> behaviour.
>>
>> Fix this by using the new sz parameter which is now provided to the
>> function. Additionally when clearing each pte in a contig range, don't
>> gather the access and dirty bits if the pte is not present.
>>
>> An alternative approach that would not require API changes would be to
>> store the PTE_CONT bit in a spare bit in the swap entry pte for the
>> non-present case. But it felt cleaner to follow other APIs' lead and
>> just pass in the size.
>>
>> As an aside, PTE_CONT is bit 52, which corresponds to bit 40 in the swap
>> entry offset field (layout of non-present pte). Since hugetlb is never
>> swapped to disk, this field will only be populated for markers, which
>> always set this bit to 0 and hwpoison swap entries, which set the offset
>> field to a PFN; So it would only ever be 1 for a 52-bit PVA system where
>> memory in that high half was poisoned (I think!). So in practice, this
>> bit would almost always be zero for non-present ptes and we would only
>> clear the first entry if it was actually a contiguous block. That's
>> probably a less severe symptom than if it was always interpretted as 1
>> and cleared out potentially-present neighboring PTEs.
>>
>> Cc: stable@vger.kernel.org
>> Fixes: 66b3923a1a0f ("arm64: hugetlb: add support for PTE contiguous bit")
>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>> ---
>> arch/arm64/mm/hugetlbpage.c | 40 ++++++++++++++++---------------------
>> 1 file changed, 17 insertions(+), 23 deletions(-)
>>
>> diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
>> index 06db4649af91..614b2feddba2 100644
>> --- a/arch/arm64/mm/hugetlbpage.c
>> +++ b/arch/arm64/mm/hugetlbpage.c
>> @@ -163,24 +163,23 @@ static pte_t get_clear_contig(struct mm_struct *mm,
>> unsigned long pgsize,
>> unsigned long ncontig)
>> {
>> - pte_t orig_pte = __ptep_get(ptep);
>> - unsigned long i;
>> -
>> - for (i = 0; i < ncontig; i++, addr += pgsize, ptep++) {
>> - pte_t pte = __ptep_get_and_clear(mm, addr, ptep);
>> -
>> - /*
>> - * If HW_AFDBM is enabled, then the HW could turn on
>> - * the dirty or accessed bit for any page in the set,
>> - * so check them all.
>> - */
>> - if (pte_dirty(pte))
>> - orig_pte = pte_mkdirty(orig_pte);
>> -
>> - if (pte_young(pte))
>> - orig_pte = pte_mkyoung(orig_pte);
>> + pte_t pte, tmp_pte;
>> + bool present;
>> +
>> + pte = __ptep_get_and_clear(mm, addr, ptep);
>> + present = pte_present(pte);
>
> pte_present() may not be evaluated for standard huge pages at [PMD|PUD]_SIZE
> e.g when ncontig = 1 in the argument.
Sorry I'm not quite sure what you're suggesting here? Are you proposing that
pte_present() should be moved into the loop so that we only actually call it
when we are going to consume it? I'm happy to do that if it's the preference,
but I thought it was neater to hoist it out of the loop.
>
>> + while (--ncontig) {
>
> Should this be converted into a for loop instead just to be in sync with other
> similar iterators in this file.
>
> for (i = 1; i < ncontig; i++, addr += pgsize, ptep++)
> {
> tmp_pte = __ptep_get_and_clear(mm, addr, ptep);
> if (present) {
> if (pte_dirty(tmp_pte))
> pte = pte_mkdirty(pte);
> if (pte_young(tmp_pte))
> pte = pte_mkyoung(pte);
> }
> }
I think the way you have written this it's incorrect. Let's say we have 16 ptes
in the block. We want to iterate over the last 15 of them (we have already read
pte 0). But you're iterating over the first 15 because you don't increment addr
and ptep until after you've been around the loop the first time. So we would
need to explicitly increment those 2 before entering the loop. But that is only
neccessary if ncontig > 1. Personally I think my approach is neater...
>
>> + ptep++;
>> + addr += pgsize;
>> + tmp_pte = __ptep_get_and_clear(mm, addr, ptep);
>> + if (present) {
>> + if (pte_dirty(tmp_pte))
>> + pte = pte_mkdirty(pte);
>> + if (pte_young(tmp_pte))
>> + pte = pte_mkyoung(pte);
>> + }
>> }
>> - return orig_pte;
>> + return pte;
>> }
>>
>> static pte_t get_clear_contig_flush(struct mm_struct *mm,
>> @@ -401,13 +400,8 @@ pte_t huge_ptep_get_and_clear(struct mm_struct *mm, unsigned long addr,
>> {
>> int ncontig;
>> size_t pgsize;
>> - pte_t orig_pte = __ptep_get(ptep);
>> -
>> - if (!pte_cont(orig_pte))
>> - return __ptep_get_and_clear(mm, addr, ptep);
>> -
>> - ncontig = find_num_contig(mm, addr, ptep, &pgsize);
>>
>> + ncontig = num_contig_ptes(sz, &pgsize);
>> return get_clear_contig(mm, addr, ptep, pgsize, ncontig);
>> }
>>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v2 1/4] mm: hugetlb: Add huge page size param to huge_ptep_get_and_clear()
2025-02-17 14:04 ` [PATCH v2 1/4] mm: hugetlb: Add huge page size param to huge_ptep_get_and_clear() Ryan Roberts
2025-02-19 8:28 ` Anshuman Khandual
@ 2025-02-19 19:00 ` Catalin Marinas
2025-02-20 9:46 ` David Hildenbrand
` (2 subsequent siblings)
4 siblings, 0 replies; 30+ messages in thread
From: Catalin Marinas @ 2025-02-19 19:00 UTC (permalink / raw)
To: Ryan Roberts
Cc: Will Deacon, Huacai Chen, WANG Xuerui, Thomas Bogendoerfer,
James E.J. Bottomley, Helge Deller, Madhavan Srinivasan,
Michael Ellerman, Nicholas Piggin, Christophe Leroy,
Naveen N Rao, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, Gerald Schaefer,
David S. Miller, Andreas Larsson, Arnd Bergmann, Muchun Song,
Andrew Morton, Uladzislau Rezki, Christoph Hellwig,
David Hildenbrand, Matthew Wilcox (Oracle),
Mark Rutland, Anshuman Khandual, Dev Jain, Kevin Brodsky,
Alexandre Ghiti, linux-arm-kernel, linux-mm, linux-kernel,
stable
On Mon, Feb 17, 2025 at 02:04:14PM +0000, Ryan Roberts wrote:
> In order to fix a bug, arm64 needs to be told the size of the huge page
> for which the huge_pte is being set in huge_ptep_get_and_clear().
> Provide for this by adding an `unsigned long sz` parameter to the
> function. This follows the same pattern as huge_pte_clear() and
> set_huge_pte_at().
>
> This commit makes the required interface modifications to the core mm as
> well as all arches that implement this function (arm64, loongarch, mips,
> parisc, powerpc, riscv, s390, sparc). The actual arm64 bug will be fixed
> in a separate commit.
>
> Cc: stable@vger.kernel.org
> Fixes: 66b3923a1a0f ("arm64: hugetlb: add support for PTE contiguous bit")
> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v2 2/4] arm64: hugetlb: Fix huge_ptep_get_and_clear() for non-present ptes
2025-02-17 14:04 ` [PATCH v2 2/4] arm64: hugetlb: Fix huge_ptep_get_and_clear() for non-present ptes Ryan Roberts
2025-02-19 8:45 ` Anshuman Khandual
@ 2025-02-19 19:03 ` Catalin Marinas
2025-02-21 15:31 ` Will Deacon
2025-02-24 11:27 ` Alexandre Ghiti
3 siblings, 0 replies; 30+ messages in thread
From: Catalin Marinas @ 2025-02-19 19:03 UTC (permalink / raw)
To: Ryan Roberts
Cc: Will Deacon, Huacai Chen, WANG Xuerui, Thomas Bogendoerfer,
James E.J. Bottomley, Helge Deller, Madhavan Srinivasan,
Michael Ellerman, Nicholas Piggin, Christophe Leroy,
Naveen N Rao, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, Gerald Schaefer,
David S. Miller, Andreas Larsson, Arnd Bergmann, Muchun Song,
Andrew Morton, Uladzislau Rezki, Christoph Hellwig,
David Hildenbrand, Matthew Wilcox (Oracle),
Mark Rutland, Anshuman Khandual, Dev Jain, Kevin Brodsky,
Alexandre Ghiti, linux-arm-kernel, linux-mm, linux-kernel,
stable
On Mon, Feb 17, 2025 at 02:04:15PM +0000, Ryan Roberts wrote:
> diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
> index 06db4649af91..614b2feddba2 100644
> --- a/arch/arm64/mm/hugetlbpage.c
> +++ b/arch/arm64/mm/hugetlbpage.c
> @@ -163,24 +163,23 @@ static pte_t get_clear_contig(struct mm_struct *mm,
> unsigned long pgsize,
> unsigned long ncontig)
> {
> - pte_t orig_pte = __ptep_get(ptep);
> - unsigned long i;
> -
> - for (i = 0; i < ncontig; i++, addr += pgsize, ptep++) {
> - pte_t pte = __ptep_get_and_clear(mm, addr, ptep);
> -
> - /*
> - * If HW_AFDBM is enabled, then the HW could turn on
> - * the dirty or accessed bit for any page in the set,
> - * so check them all.
> - */
> - if (pte_dirty(pte))
> - orig_pte = pte_mkdirty(orig_pte);
> -
> - if (pte_young(pte))
> - orig_pte = pte_mkyoung(orig_pte);
> + pte_t pte, tmp_pte;
> + bool present;
> +
> + pte = __ptep_get_and_clear(mm, addr, ptep);
> + present = pte_present(pte);
> + while (--ncontig) {
A 'for' loop may be more consistent with the rest of the file but I
really don't mind the 'while' loop.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v2 3/4] arm64: hugetlb: Fix flush_hugetlb_tlb_range() invalidation level
2025-02-17 14:04 ` [PATCH v2 3/4] arm64: hugetlb: Fix flush_hugetlb_tlb_range() invalidation level Ryan Roberts
2025-02-19 8:56 ` Anshuman Khandual
@ 2025-02-19 19:04 ` Catalin Marinas
1 sibling, 0 replies; 30+ messages in thread
From: Catalin Marinas @ 2025-02-19 19:04 UTC (permalink / raw)
To: Ryan Roberts
Cc: Will Deacon, Huacai Chen, WANG Xuerui, Thomas Bogendoerfer,
James E.J. Bottomley, Helge Deller, Madhavan Srinivasan,
Michael Ellerman, Nicholas Piggin, Christophe Leroy,
Naveen N Rao, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, Gerald Schaefer,
David S. Miller, Andreas Larsson, Arnd Bergmann, Muchun Song,
Andrew Morton, Uladzislau Rezki, Christoph Hellwig,
David Hildenbrand, Matthew Wilcox (Oracle),
Mark Rutland, Anshuman Khandual, Dev Jain, Kevin Brodsky,
Alexandre Ghiti, linux-arm-kernel, linux-mm, linux-kernel,
stable
On Mon, Feb 17, 2025 at 02:04:16PM +0000, Ryan Roberts wrote:
> commit c910f2b65518 ("arm64/mm: Update tlb invalidation routines for
> FEAT_LPA2") changed the "invalidation level unknown" hint from 0 to
> TLBI_TTL_UNKNOWN (INT_MAX). But the fallback "unknown level" path in
> flush_hugetlb_tlb_range() was not updated. So as it stands, when trying
> to invalidate CONT_PMD_SIZE or CONT_PTE_SIZE hugetlb mappings, we will
> spuriously try to invalidate at level 0 on LPA2-enabled systems.
>
> Fix this so that the fallback passes TLBI_TTL_UNKNOWN, and while we are
> at it, explicitly use the correct stride and level for CONT_PMD_SIZE and
> CONT_PTE_SIZE, which should provide a minor optimization.
>
> Cc: stable@vger.kernel.org
> Fixes: c910f2b65518 ("arm64/mm: Update tlb invalidation routines for FEAT_LPA2")
> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v2 4/4] mm: Don't skip arch_sync_kernel_mappings() in error paths
2025-02-17 14:04 ` [PATCH v2 4/4] mm: Don't skip arch_sync_kernel_mappings() in error paths Ryan Roberts
@ 2025-02-19 19:07 ` Catalin Marinas
0 siblings, 0 replies; 30+ messages in thread
From: Catalin Marinas @ 2025-02-19 19:07 UTC (permalink / raw)
To: Ryan Roberts
Cc: Will Deacon, Huacai Chen, WANG Xuerui, Thomas Bogendoerfer,
James E.J. Bottomley, Helge Deller, Madhavan Srinivasan,
Michael Ellerman, Nicholas Piggin, Christophe Leroy,
Naveen N Rao, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, Gerald Schaefer,
David S. Miller, Andreas Larsson, Arnd Bergmann, Muchun Song,
Andrew Morton, Uladzislau Rezki, Christoph Hellwig,
David Hildenbrand, Matthew Wilcox (Oracle),
Mark Rutland, Anshuman Khandual, Dev Jain, Kevin Brodsky,
Alexandre Ghiti, linux-arm-kernel, linux-mm, linux-kernel,
stable
On Mon, Feb 17, 2025 at 02:04:17PM +0000, Ryan Roberts wrote:
> Fix callers that previously skipped calling arch_sync_kernel_mappings()
> if an error occurred during a pgtable update. The call is still required
> to sync any pgtable updates that may have occurred prior to hitting the
> error condition.
>
> These are theoretical bugs discovered during code review.
>
> Cc: stable@vger.kernel.org
> Fixes: 2ba3e6947aed ("mm/vmalloc: track which page-table levels were modified")
> Fixes: 0c95cba49255 ("mm: apply_to_pte_range warn and fail if a large pte is encountered")
> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v2 0/4] Fixes for hugetlb and vmalloc on arm64
2025-02-17 14:04 [PATCH v2 0/4] Fixes for hugetlb and vmalloc on arm64 Ryan Roberts
` (3 preceding siblings ...)
2025-02-17 14:04 ` [PATCH v2 4/4] mm: Don't skip arch_sync_kernel_mappings() in error paths Ryan Roberts
@ 2025-02-19 19:10 ` Catalin Marinas
2025-02-21 15:35 ` Will Deacon
4 siblings, 1 reply; 30+ messages in thread
From: Catalin Marinas @ 2025-02-19 19:10 UTC (permalink / raw)
To: Will Deacon, Ryan Roberts
Cc: Huacai Chen, WANG Xuerui, Thomas Bogendoerfer,
James E.J. Bottomley, Helge Deller, Madhavan Srinivasan,
Michael Ellerman, Nicholas Piggin, Christophe Leroy,
Naveen N Rao, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, Gerald Schaefer,
David S. Miller, Andreas Larsson, Arnd Bergmann, Muchun Song,
Andrew Morton, Uladzislau Rezki, Christoph Hellwig,
David Hildenbrand, Matthew Wilcox (Oracle),
Mark Rutland, Anshuman Khandual, Dev Jain, Kevin Brodsky,
Alexandre Ghiti, linux-arm-kernel, linux-mm, linux-kernel
Will, Ryan,
On Mon, Feb 17, 2025 at 02:04:13PM +0000, Ryan Roberts wrote:
> This series contains some fixes for hugetlb on arm64, and is split out from v1
> of a wider series at [1]. While the last patch is technically targetting core-mm
> and is not directly related to arm64, I'd like to to go via the arm64 tree so
> that the wider performance improvement series (v2 to be posted shortly) that
> depends on this series doesn't have to be robust to the fix not being present.
>
> I've included maintainers/reviewers for all the arches that are (trivially)
> touched due to the API changes, hoping for some ACKs.
These fixes look fine to me and I think we should get them in for 6.14.
I think Andrew was ok with them going in via the arm64 tree:
https://lore.kernel.org/linux-arm-kernel/20250205235219.3c3a4b968087d1386d708b04@linux-foundation.org/
--
Catalin
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v2 2/4] arm64: hugetlb: Fix huge_ptep_get_and_clear() for non-present ptes
2025-02-19 8:58 ` Ryan Roberts
@ 2025-02-20 6:37 ` Anshuman Khandual
2025-02-21 9:55 ` Catalin Marinas
0 siblings, 1 reply; 30+ messages in thread
From: Anshuman Khandual @ 2025-02-20 6:37 UTC (permalink / raw)
To: Ryan Roberts, Catalin Marinas, Will Deacon, Huacai Chen,
WANG Xuerui, Thomas Bogendoerfer, James E.J. Bottomley,
Helge Deller, Madhavan Srinivasan, Michael Ellerman,
Nicholas Piggin, Christophe Leroy, Naveen N Rao, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Gerald Schaefer, David S. Miller, Andreas Larsson, Arnd Bergmann,
Muchun Song, Andrew Morton, Uladzislau Rezki, Christoph Hellwig,
David Hildenbrand, Matthew Wilcox (Oracle),
Mark Rutland, Dev Jain, Kevin Brodsky, Alexandre Ghiti
Cc: linux-arm-kernel, linux-mm, linux-kernel, stable
On 2/19/25 14:28, Ryan Roberts wrote:
> On 19/02/2025 08:45, Anshuman Khandual wrote:
>>
>>
>> On 2/17/25 19:34, Ryan Roberts wrote:
>>> arm64 supports multiple huge_pte sizes. Some of the sizes are covered by
>>> a single pte entry at a particular level (PMD_SIZE, PUD_SIZE), and some
>>> are covered by multiple ptes at a particular level (CONT_PTE_SIZE,
>>> CONT_PMD_SIZE). So the function has to figure out the size from the
>>> huge_pte pointer. This was previously done by walking the pgtable to
>>> determine the level and by using the PTE_CONT bit to determine the
>>> number of ptes at the level.
>>>
>>> But the PTE_CONT bit is only valid when the pte is present. For
>>> non-present pte values (e.g. markers, migration entries), the previous
>>> implementation was therefore erroniously determining the size. There is
typo - s/erroniously/erroneously ^^^^^^
>>> at least one known caller in core-mm, move_huge_pte(), which may call
>>> huge_ptep_get_and_clear() for a non-present pte. So we must be robust to
>>> this case. Additionally the "regular" ptep_get_and_clear() is robust to
>>> being called for non-present ptes so it makes sense to follow the
>>> behaviour.
>>>
>>> Fix this by using the new sz parameter which is now provided to the
>>> function. Additionally when clearing each pte in a contig range, don't
>>> gather the access and dirty bits if the pte is not present.
>>>
>>> An alternative approach that would not require API changes would be to
>>> store the PTE_CONT bit in a spare bit in the swap entry pte for the
>>> non-present case. But it felt cleaner to follow other APIs' lead and
>>> just pass in the size.
>>>
>>> As an aside, PTE_CONT is bit 52, which corresponds to bit 40 in the swap
>>> entry offset field (layout of non-present pte). Since hugetlb is never
>>> swapped to disk, this field will only be populated for markers, which
>>> always set this bit to 0 and hwpoison swap entries, which set the offset
>>> field to a PFN; So it would only ever be 1 for a 52-bit PVA system where
>>> memory in that high half was poisoned (I think!). So in practice, this
>>> bit would almost always be zero for non-present ptes and we would only
>>> clear the first entry if it was actually a contiguous block. That's
>>> probably a less severe symptom than if it was always interpretted as 1
typo - s/interpretted/interpreted ^^^^^^
>>> and cleared out potentially-present neighboring PTEs.
>>>
>>> Cc: stable@vger.kernel.org
>>> Fixes: 66b3923a1a0f ("arm64: hugetlb: add support for PTE contiguous bit")
>>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>>> ---
>>> arch/arm64/mm/hugetlbpage.c | 40 ++++++++++++++++---------------------
>>> 1 file changed, 17 insertions(+), 23 deletions(-)
>>>
>>> diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
>>> index 06db4649af91..614b2feddba2 100644
>>> --- a/arch/arm64/mm/hugetlbpage.c
>>> +++ b/arch/arm64/mm/hugetlbpage.c
>>> @@ -163,24 +163,23 @@ static pte_t get_clear_contig(struct mm_struct *mm,
>>> unsigned long pgsize,
>>> unsigned long ncontig)
>>> {
>>> - pte_t orig_pte = __ptep_get(ptep);
>>> - unsigned long i;
>>> -
>>> - for (i = 0; i < ncontig; i++, addr += pgsize, ptep++) {
>>> - pte_t pte = __ptep_get_and_clear(mm, addr, ptep);
>>> -
>>> - /*
>>> - * If HW_AFDBM is enabled, then the HW could turn on
>>> - * the dirty or accessed bit for any page in the set,
>>> - * so check them all.
>>> - */
>>> - if (pte_dirty(pte))
>>> - orig_pte = pte_mkdirty(orig_pte);
>>> -
>>> - if (pte_young(pte))
>>> - orig_pte = pte_mkyoung(orig_pte);
>>> + pte_t pte, tmp_pte;
>>> + bool present;
>>> +
>>> + pte = __ptep_get_and_clear(mm, addr, ptep);
>>> + present = pte_present(pte);
>>
>> pte_present() may not be evaluated for standard huge pages at [PMD|PUD]_SIZE
>> e.g when ncontig = 1 in the argument.
>
> Sorry I'm not quite sure what you're suggesting here? Are you proposing that
> pte_present() should be moved into the loop so that we only actually call it
> when we are going to consume it? I'm happy to do that if it's the preference,
Right, pte_present() is only required for the cont huge pages but not for the
normal huge pages.
> but I thought it was neater to hoist it out of the loop.
Agreed, but when possible pte_present() cost should be avoided for the normal
huge pages where it is not required.
>
>>
>>> + while (--ncontig) {
>>
>> Should this be converted into a for loop instead just to be in sync with other
>> similar iterators in this file.
>>
>> for (i = 1; i < ncontig; i++, addr += pgsize, ptep++)
>> {
>> tmp_pte = __ptep_get_and_clear(mm, addr, ptep);
>> if (present) {
>> if (pte_dirty(tmp_pte))
>> pte = pte_mkdirty(pte);
>> if (pte_young(tmp_pte))
>> pte = pte_mkyoung(pte);
>> }
>> }
>
> I think the way you have written this it's incorrect. Let's say we have 16 ptes
> in the block. We want to iterate over the last 15 of them (we have already read
> pte 0). But you're iterating over the first 15 because you don't increment addr
> and ptep until after you've been around the loop the first time. So we would
> need to explicitly increment those 2 before entering the loop. But that is only
> neccessary if ncontig > 1. Personally I think my approach is neater...
Thinking about this again. Just wondering should not a pte_present()
check on each entries being cleared along with (ncontig > 1) in this
existing loop before transferring over the dirty and accessed bits -
also work as intended with less code churn ?
static pte_t get_clear_contig(struct mm_struct *mm,
unsigned long addr,
pte_t *ptep,
unsigned long pgsize,
unsigned long ncontig)
{
pte_t orig_pte = __ptep_get(ptep);
unsigned long i;
for (i = 0; i < ncontig; i++, addr += pgsize, ptep++) {
pte_t pte = __ptep_get_and_clear(mm, addr, ptep);
if (ncontig > 1 && !pte_present(pte))
continue;
/*
* If HW_AFDBM is enabled, then the HW could turn on
* the dirty or accessed bit for any page in the set,
* so check them all.
*/
if (pte_dirty(pte))
orig_pte = pte_mkdirty(orig_pte);
if (pte_young(pte))
orig_pte = pte_mkyoung(orig_pte);
}
return orig_pte;
}
* Normal huge pages
- enters the for loop just once
- clears the single entry
- always transfers dirty and access bits
- pte_present() does not matter as ncontig = 1
* Contig huge pages
- enters the for loop ncontig times - for each sub page
- clears all sub page entries
- transfers dirty and access bits only when pte_present()
- pte_present() is relevant as ncontig > 1
>
>>
>>> + ptep++;
>>> + addr += pgsize;
>>> + tmp_pte = __ptep_get_and_clear(mm, addr, ptep);
>>> + if (present) {
>>> + if (pte_dirty(tmp_pte))
>>> + pte = pte_mkdirty(pte);
>>> + if (pte_young(tmp_pte))
>>> + pte = pte_mkyoung(pte);
>>> + }
>>> }
>>> - return orig_pte;
>>> + return pte;
>>> }
>>>
>>> static pte_t get_clear_contig_flush(struct mm_struct *mm,
>>> @@ -401,13 +400,8 @@ pte_t huge_ptep_get_and_clear(struct mm_struct *mm, unsigned long addr,
>>> {
>>> int ncontig;
>>> size_t pgsize;
>>> - pte_t orig_pte = __ptep_get(ptep);
>>> -
>>> - if (!pte_cont(orig_pte))
>>> - return __ptep_get_and_clear(mm, addr, ptep);
>>> -
>>> - ncontig = find_num_contig(mm, addr, ptep, &pgsize);
>>>
>>> + ncontig = num_contig_ptes(sz, &pgsize);
>>> return get_clear_contig(mm, addr, ptep, pgsize, ncontig);
>>> }
>>>
>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v2 1/4] mm: hugetlb: Add huge page size param to huge_ptep_get_and_clear()
2025-02-17 14:04 ` [PATCH v2 1/4] mm: hugetlb: Add huge page size param to huge_ptep_get_and_clear() Ryan Roberts
2025-02-19 8:28 ` Anshuman Khandual
2025-02-19 19:00 ` Catalin Marinas
@ 2025-02-20 9:46 ` David Hildenbrand
2025-02-24 10:49 ` Alexandre Ghiti
2025-02-25 14:25 ` Alexander Gordeev
2025-02-26 7:37 ` Christophe Leroy
4 siblings, 1 reply; 30+ messages in thread
From: David Hildenbrand @ 2025-02-20 9:46 UTC (permalink / raw)
To: Ryan Roberts, Catalin Marinas, Will Deacon, Huacai Chen,
WANG Xuerui, Thomas Bogendoerfer, James E.J. Bottomley,
Helge Deller, Madhavan Srinivasan, Michael Ellerman,
Nicholas Piggin, Christophe Leroy, Naveen N Rao, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Gerald Schaefer, David S. Miller, Andreas Larsson, Arnd Bergmann,
Muchun Song, Andrew Morton, Uladzislau Rezki, Christoph Hellwig,
Matthew Wilcox (Oracle),
Mark Rutland, Anshuman Khandual, Dev Jain, Kevin Brodsky,
Alexandre Ghiti
Cc: linux-arm-kernel, linux-mm, linux-kernel, stable
On 17.02.25 15:04, Ryan Roberts wrote:
> In order to fix a bug, arm64 needs to be told the size of the huge page
> for which the huge_pte is being set in huge_ptep_get_and_clear().
s/set/cleared/ ?
> Provide for this by adding an `unsigned long sz` parameter to the
> function. This follows the same pattern as huge_pte_clear() and
> set_huge_pte_at().
>
> This commit makes the required interface modifications to the core mm as
> well as all arches that implement this function (arm64, loongarch, mips,
> parisc, powerpc, riscv, s390, sparc). The actual arm64 bug will be fixed
> in a separate commit.
>
> Cc: stable@vger.kernel.org
> Fixes: 66b3923a1a0f ("arm64: hugetlb: add support for PTE contiguous bit")
> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> ---
Acked-by: David Hildenbrand <david@redhat.com>
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v2 2/4] arm64: hugetlb: Fix huge_ptep_get_and_clear() for non-present ptes
2025-02-20 6:37 ` Anshuman Khandual
@ 2025-02-21 9:55 ` Catalin Marinas
2025-02-21 10:26 ` Ryan Roberts
0 siblings, 1 reply; 30+ messages in thread
From: Catalin Marinas @ 2025-02-21 9:55 UTC (permalink / raw)
To: Anshuman Khandual
Cc: Ryan Roberts, Will Deacon, Huacai Chen, WANG Xuerui,
Thomas Bogendoerfer, James E.J. Bottomley, Helge Deller,
Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
Christophe Leroy, Naveen N Rao, Paul Walmsley, Palmer Dabbelt,
Albert Ou, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, Gerald Schaefer,
David S. Miller, Andreas Larsson, Arnd Bergmann, Muchun Song,
Andrew Morton, Uladzislau Rezki, Christoph Hellwig,
David Hildenbrand, Matthew Wilcox (Oracle),
Mark Rutland, Dev Jain, Kevin Brodsky, Alexandre Ghiti,
linux-arm-kernel, linux-mm, linux-kernel, stable
On Thu, Feb 20, 2025 at 12:07:35PM +0530, Anshuman Khandual wrote:
> On 2/19/25 14:28, Ryan Roberts wrote:
> > On 19/02/2025 08:45, Anshuman Khandual wrote:
> >> On 2/17/25 19:34, Ryan Roberts wrote:
> >>> + while (--ncontig) {
> >>
> >> Should this be converted into a for loop instead just to be in sync with other
> >> similar iterators in this file.
> >>
> >> for (i = 1; i < ncontig; i++, addr += pgsize, ptep++)
> >> {
> >> tmp_pte = __ptep_get_and_clear(mm, addr, ptep);
> >> if (present) {
> >> if (pte_dirty(tmp_pte))
> >> pte = pte_mkdirty(pte);
> >> if (pte_young(tmp_pte))
> >> pte = pte_mkyoung(pte);
> >> }
> >> }
> >
> > I think the way you have written this it's incorrect. Let's say we have 16 ptes
> > in the block. We want to iterate over the last 15 of them (we have already read
> > pte 0). But you're iterating over the first 15 because you don't increment addr
> > and ptep until after you've been around the loop the first time. So we would
> > need to explicitly increment those 2 before entering the loop. But that is only
> > neccessary if ncontig > 1. Personally I think my approach is neater...
>
> Thinking about this again. Just wondering should not a pte_present()
> check on each entries being cleared along with (ncontig > 1) in this
> existing loop before transferring over the dirty and accessed bits -
> also work as intended with less code churn ?
Shouldn't all the ptes in a contig block be either all present or all
not-present? Is there any point in checking each individually?
--
Catalin
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v2 2/4] arm64: hugetlb: Fix huge_ptep_get_and_clear() for non-present ptes
2025-02-21 9:55 ` Catalin Marinas
@ 2025-02-21 10:26 ` Ryan Roberts
0 siblings, 0 replies; 30+ messages in thread
From: Ryan Roberts @ 2025-02-21 10:26 UTC (permalink / raw)
To: Catalin Marinas, Anshuman Khandual
Cc: Will Deacon, Huacai Chen, WANG Xuerui, Thomas Bogendoerfer,
James E.J. Bottomley, Helge Deller, Madhavan Srinivasan,
Michael Ellerman, Nicholas Piggin, Christophe Leroy,
Naveen N Rao, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, Gerald Schaefer,
David S. Miller, Andreas Larsson, Arnd Bergmann, Muchun Song,
Andrew Morton, Uladzislau Rezki, Christoph Hellwig,
David Hildenbrand, Matthew Wilcox (Oracle),
Mark Rutland, Dev Jain, Kevin Brodsky, Alexandre Ghiti,
linux-arm-kernel, linux-mm, linux-kernel, stable
On 21/02/2025 09:55, Catalin Marinas wrote:
> On Thu, Feb 20, 2025 at 12:07:35PM +0530, Anshuman Khandual wrote:
>> On 2/19/25 14:28, Ryan Roberts wrote:
>>> On 19/02/2025 08:45, Anshuman Khandual wrote:
>>>> On 2/17/25 19:34, Ryan Roberts wrote:
>>>>> + while (--ncontig) {
>>>>
>>>> Should this be converted into a for loop instead just to be in sync with other
>>>> similar iterators in this file.
>>>>
>>>> for (i = 1; i < ncontig; i++, addr += pgsize, ptep++)
>>>> {
>>>> tmp_pte = __ptep_get_and_clear(mm, addr, ptep);
>>>> if (present) {
>>>> if (pte_dirty(tmp_pte))
>>>> pte = pte_mkdirty(pte);
>>>> if (pte_young(tmp_pte))
>>>> pte = pte_mkyoung(pte);
>>>> }
>>>> }
>>>
>>> I think the way you have written this it's incorrect. Let's say we have 16 ptes
>>> in the block. We want to iterate over the last 15 of them (we have already read
>>> pte 0). But you're iterating over the first 15 because you don't increment addr
>>> and ptep until after you've been around the loop the first time. So we would
>>> need to explicitly increment those 2 before entering the loop. But that is only
>>> neccessary if ncontig > 1. Personally I think my approach is neater...
>>
>> Thinking about this again. Just wondering should not a pte_present()
>> check on each entries being cleared along with (ncontig > 1) in this
>> existing loop before transferring over the dirty and accessed bits -
>> also work as intended with less code churn ?
>
> Shouldn't all the ptes in a contig block be either all present or all
> not-present? Is there any point in checking each individually?
I agree, that's why I was just testing it once outside the loop. I'm confident
my code and Anshuman's alternative are both correct. So it's just a stylistic
choice really. I'd prefer to leave it as is, but will change if there is
consensus. I'm not sure I'm hearing that though.
Catalin, do you want me to respin the series to fix up the typos that Anshuman
highlighted, or will you handle that?
Thanks,
Ryan
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v2 2/4] arm64: hugetlb: Fix huge_ptep_get_and_clear() for non-present ptes
2025-02-17 14:04 ` [PATCH v2 2/4] arm64: hugetlb: Fix huge_ptep_get_and_clear() for non-present ptes Ryan Roberts
2025-02-19 8:45 ` Anshuman Khandual
2025-02-19 19:03 ` Catalin Marinas
@ 2025-02-21 15:31 ` Will Deacon
2025-02-24 12:11 ` Ryan Roberts
2025-02-24 11:27 ` Alexandre Ghiti
3 siblings, 1 reply; 30+ messages in thread
From: Will Deacon @ 2025-02-21 15:31 UTC (permalink / raw)
To: Ryan Roberts
Cc: Catalin Marinas, Huacai Chen, WANG Xuerui, Thomas Bogendoerfer,
James E.J. Bottomley, Helge Deller, Madhavan Srinivasan,
Michael Ellerman, Nicholas Piggin, Christophe Leroy,
Naveen N Rao, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, Gerald Schaefer,
David S. Miller, Andreas Larsson, Arnd Bergmann, Muchun Song,
Andrew Morton, Uladzislau Rezki, Christoph Hellwig,
David Hildenbrand, Matthew Wilcox (Oracle),
Mark Rutland, Anshuman Khandual, Dev Jain, Kevin Brodsky,
Alexandre Ghiti, linux-arm-kernel, linux-mm, linux-kernel,
stable
On Mon, Feb 17, 2025 at 02:04:15PM +0000, Ryan Roberts wrote:
> arm64 supports multiple huge_pte sizes. Some of the sizes are covered by
> a single pte entry at a particular level (PMD_SIZE, PUD_SIZE), and some
> are covered by multiple ptes at a particular level (CONT_PTE_SIZE,
> CONT_PMD_SIZE). So the function has to figure out the size from the
> huge_pte pointer. This was previously done by walking the pgtable to
> determine the level and by using the PTE_CONT bit to determine the
> number of ptes at the level.
>
> But the PTE_CONT bit is only valid when the pte is present. For
> non-present pte values (e.g. markers, migration entries), the previous
> implementation was therefore erroniously determining the size. There is
> at least one known caller in core-mm, move_huge_pte(), which may call
> huge_ptep_get_and_clear() for a non-present pte. So we must be robust to
> this case. Additionally the "regular" ptep_get_and_clear() is robust to
> being called for non-present ptes so it makes sense to follow the
> behaviour.
>
> Fix this by using the new sz parameter which is now provided to the
> function. Additionally when clearing each pte in a contig range, don't
> gather the access and dirty bits if the pte is not present.
>
> An alternative approach that would not require API changes would be to
> store the PTE_CONT bit in a spare bit in the swap entry pte for the
> non-present case. But it felt cleaner to follow other APIs' lead and
> just pass in the size.
>
> As an aside, PTE_CONT is bit 52, which corresponds to bit 40 in the swap
> entry offset field (layout of non-present pte). Since hugetlb is never
> swapped to disk, this field will only be populated for markers, which
> always set this bit to 0 and hwpoison swap entries, which set the offset
> field to a PFN; So it would only ever be 1 for a 52-bit PVA system where
> memory in that high half was poisoned (I think!). So in practice, this
> bit would almost always be zero for non-present ptes and we would only
> clear the first entry if it was actually a contiguous block. That's
> probably a less severe symptom than if it was always interpretted as 1
> and cleared out potentially-present neighboring PTEs.
>
> Cc: stable@vger.kernel.org
> Fixes: 66b3923a1a0f ("arm64: hugetlb: add support for PTE contiguous bit")
> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> ---
> arch/arm64/mm/hugetlbpage.c | 40 ++++++++++++++++---------------------
> 1 file changed, 17 insertions(+), 23 deletions(-)
>
> diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
> index 06db4649af91..614b2feddba2 100644
> --- a/arch/arm64/mm/hugetlbpage.c
> +++ b/arch/arm64/mm/hugetlbpage.c
> @@ -163,24 +163,23 @@ static pte_t get_clear_contig(struct mm_struct *mm,
> unsigned long pgsize,
> unsigned long ncontig)
> {
> - pte_t orig_pte = __ptep_get(ptep);
> - unsigned long i;
> -
> - for (i = 0; i < ncontig; i++, addr += pgsize, ptep++) {
> - pte_t pte = __ptep_get_and_clear(mm, addr, ptep);
> -
> - /*
> - * If HW_AFDBM is enabled, then the HW could turn on
> - * the dirty or accessed bit for any page in the set,
> - * so check them all.
> - */
> - if (pte_dirty(pte))
> - orig_pte = pte_mkdirty(orig_pte);
> -
> - if (pte_young(pte))
> - orig_pte = pte_mkyoung(orig_pte);
> + pte_t pte, tmp_pte;
> + bool present;
> +
> + pte = __ptep_get_and_clear(mm, addr, ptep);
> + present = pte_present(pte);
> + while (--ncontig) {
> + ptep++;
> + addr += pgsize;
> + tmp_pte = __ptep_get_and_clear(mm, addr, ptep);
> + if (present) {
> + if (pte_dirty(tmp_pte))
> + pte = pte_mkdirty(pte);
> + if (pte_young(tmp_pte))
> + pte = pte_mkyoung(pte);
> + }
> }
nit: With the loop now structured like this, we really can't handle
num_contig_ptes() returning 0 if it gets an unknown size. Granted, that
really shouldn't happen, but perhaps it would be better to add a 'default'
case with a WARN() to num_contig_ptes() and then add an early return here?
Will
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v2 0/4] Fixes for hugetlb and vmalloc on arm64
2025-02-19 19:10 ` [PATCH v2 0/4] Fixes for hugetlb and vmalloc on arm64 Catalin Marinas
@ 2025-02-21 15:35 ` Will Deacon
2025-02-24 12:13 ` Ryan Roberts
0 siblings, 1 reply; 30+ messages in thread
From: Will Deacon @ 2025-02-21 15:35 UTC (permalink / raw)
To: Catalin Marinas
Cc: Ryan Roberts, Huacai Chen, WANG Xuerui, Thomas Bogendoerfer,
James E.J. Bottomley, Helge Deller, Madhavan Srinivasan,
Michael Ellerman, Nicholas Piggin, Christophe Leroy,
Naveen N Rao, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, Gerald Schaefer,
David S. Miller, Andreas Larsson, Arnd Bergmann, Muchun Song,
Andrew Morton, Uladzislau Rezki, Christoph Hellwig,
David Hildenbrand, Matthew Wilcox (Oracle),
Mark Rutland, Anshuman Khandual, Dev Jain, Kevin Brodsky,
Alexandre Ghiti, linux-arm-kernel, linux-mm, linux-kernel
On Wed, Feb 19, 2025 at 07:10:32PM +0000, Catalin Marinas wrote:
> Will, Ryan,
>
> On Mon, Feb 17, 2025 at 02:04:13PM +0000, Ryan Roberts wrote:
> > This series contains some fixes for hugetlb on arm64, and is split out from v1
> > of a wider series at [1]. While the last patch is technically targetting core-mm
> > and is not directly related to arm64, I'd like to to go via the arm64 tree so
> > that the wider performance improvement series (v2 to be posted shortly) that
> > depends on this series doesn't have to be robust to the fix not being present.
> >
> > I've included maintainers/reviewers for all the arches that are (trivially)
> > touched due to the API changes, hoping for some ACKs.
>
> These fixes look fine to me and I think we should get them in for 6.14.
> I think Andrew was ok with them going in via the arm64 tree:
>
> https://lore.kernel.org/linux-arm-kernel/20250205235219.3c3a4b968087d1386d708b04@linux-foundation.org/
I think the diffstat looks worse than it really is, as the arch changes
are reasonably mechanical. I'd like to let it sit in next for a few days
though, so I'll pick this up once we've resolved my comment on patch #2.
Ryan -- did you find all of these issues by inspection, or are you aware
of anybody hitting them in practice?
Will
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v2 1/4] mm: hugetlb: Add huge page size param to huge_ptep_get_and_clear()
2025-02-20 9:46 ` David Hildenbrand
@ 2025-02-24 10:49 ` Alexandre Ghiti
0 siblings, 0 replies; 30+ messages in thread
From: Alexandre Ghiti @ 2025-02-24 10:49 UTC (permalink / raw)
To: David Hildenbrand
Cc: Ryan Roberts, Catalin Marinas, Will Deacon, Huacai Chen,
WANG Xuerui, Thomas Bogendoerfer, James E.J. Bottomley,
Helge Deller, Madhavan Srinivasan, Michael Ellerman,
Nicholas Piggin, Christophe Leroy, Naveen N Rao, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Gerald Schaefer, David S. Miller, Andreas Larsson, Arnd Bergmann,
Muchun Song, Andrew Morton, Uladzislau Rezki, Christoph Hellwig,
Matthew Wilcox (Oracle),
Mark Rutland, Anshuman Khandual, Dev Jain, Kevin Brodsky,
linux-arm-kernel, linux-mm, linux-kernel, stable
Hi Ryan,
On Thu, Feb 20, 2025 at 10:46 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 17.02.25 15:04, Ryan Roberts wrote:
> > In order to fix a bug, arm64 needs to be told the size of the huge page
> > for which the huge_pte is being set in huge_ptep_get_and_clear().
>
> s/set/cleared/ ?
>
> > Provide for this by adding an `unsigned long sz` parameter to the
> > function. This follows the same pattern as huge_pte_clear() and
> > set_huge_pte_at().
> >
> > This commit makes the required interface modifications to the core mm as
> > well as all arches that implement this function (arm64, loongarch, mips,
> > parisc, powerpc, riscv, s390, sparc). The actual arm64 bug will be fixed
> > in a separate commit.
> >
> > Cc: stable@vger.kernel.org
> > Fixes: 66b3923a1a0f ("arm64: hugetlb: add support for PTE contiguous bit")
> > Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> # riscv
Thanks,
Alex
> > ---
>
> Acked-by: David Hildenbrand <david@redhat.com>
>
> --
> Cheers,
>
> David / dhildenb
>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v2 2/4] arm64: hugetlb: Fix huge_ptep_get_and_clear() for non-present ptes
2025-02-17 14:04 ` [PATCH v2 2/4] arm64: hugetlb: Fix huge_ptep_get_and_clear() for non-present ptes Ryan Roberts
` (2 preceding siblings ...)
2025-02-21 15:31 ` Will Deacon
@ 2025-02-24 11:27 ` Alexandre Ghiti
3 siblings, 0 replies; 30+ messages in thread
From: Alexandre Ghiti @ 2025-02-24 11:27 UTC (permalink / raw)
To: Ryan Roberts
Cc: Catalin Marinas, Will Deacon, Huacai Chen, WANG Xuerui,
Thomas Bogendoerfer, James E.J. Bottomley, Helge Deller,
Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
Christophe Leroy, Naveen N Rao, Paul Walmsley, Palmer Dabbelt,
Albert Ou, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, Gerald Schaefer,
David S. Miller, Andreas Larsson, Arnd Bergmann, Muchun Song,
Andrew Morton, Uladzislau Rezki, Christoph Hellwig,
David Hildenbrand, Matthew Wilcox (Oracle),
Mark Rutland, Anshuman Khandual, Dev Jain, Kevin Brodsky,
linux-arm-kernel, linux-mm, linux-kernel, stable
Hi Ryan,
On Mon, Feb 17, 2025 at 3:04 PM Ryan Roberts <ryan.roberts@arm.com> wrote:
>
> arm64 supports multiple huge_pte sizes. Some of the sizes are covered by
> a single pte entry at a particular level (PMD_SIZE, PUD_SIZE), and some
> are covered by multiple ptes at a particular level (CONT_PTE_SIZE,
> CONT_PMD_SIZE). So the function has to figure out the size from the
> huge_pte pointer. This was previously done by walking the pgtable to
> determine the level and by using the PTE_CONT bit to determine the
> number of ptes at the level.
>
> But the PTE_CONT bit is only valid when the pte is present. For
> non-present pte values (e.g. markers, migration entries), the previous
> implementation was therefore erroniously determining the size. There is
> at least one known caller in core-mm, move_huge_pte(), which may call
> huge_ptep_get_and_clear() for a non-present pte. So we must be robust to
> this case. Additionally the "regular" ptep_get_and_clear() is robust to
> being called for non-present ptes so it makes sense to follow the
> behaviour.
>
> Fix this by using the new sz parameter which is now provided to the
> function. Additionally when clearing each pte in a contig range, don't
> gather the access and dirty bits if the pte is not present.
>
> An alternative approach that would not require API changes would be to
> store the PTE_CONT bit in a spare bit in the swap entry pte for the
> non-present case. But it felt cleaner to follow other APIs' lead and
> just pass in the size.
>
> As an aside, PTE_CONT is bit 52, which corresponds to bit 40 in the swap
> entry offset field (layout of non-present pte). Since hugetlb is never
> swapped to disk, this field will only be populated for markers, which
> always set this bit to 0 and hwpoison swap entries, which set the offset
> field to a PFN; So it would only ever be 1 for a 52-bit PVA system where
> memory in that high half was poisoned (I think!). So in practice, this
> bit would almost always be zero for non-present ptes and we would only
> clear the first entry if it was actually a contiguous block. That's
> probably a less severe symptom than if it was always interpretted as 1
> and cleared out potentially-present neighboring PTEs.
>
> Cc: stable@vger.kernel.org
> Fixes: 66b3923a1a0f ("arm64: hugetlb: add support for PTE contiguous bit")
> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> ---
> arch/arm64/mm/hugetlbpage.c | 40 ++++++++++++++++---------------------
> 1 file changed, 17 insertions(+), 23 deletions(-)
>
> diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
> index 06db4649af91..614b2feddba2 100644
> --- a/arch/arm64/mm/hugetlbpage.c
> +++ b/arch/arm64/mm/hugetlbpage.c
> @@ -163,24 +163,23 @@ static pte_t get_clear_contig(struct mm_struct *mm,
> unsigned long pgsize,
> unsigned long ncontig)
> {
> - pte_t orig_pte = __ptep_get(ptep);
> - unsigned long i;
> -
> - for (i = 0; i < ncontig; i++, addr += pgsize, ptep++) {
> - pte_t pte = __ptep_get_and_clear(mm, addr, ptep);
> -
> - /*
> - * If HW_AFDBM is enabled, then the HW could turn on
> - * the dirty or accessed bit for any page in the set,
> - * so check them all.
> - */
> - if (pte_dirty(pte))
> - orig_pte = pte_mkdirty(orig_pte);
> -
> - if (pte_young(pte))
> - orig_pte = pte_mkyoung(orig_pte);
> + pte_t pte, tmp_pte;
> + bool present;
> +
> + pte = __ptep_get_and_clear(mm, addr, ptep);
> + present = pte_present(pte);
> + while (--ncontig) {
> + ptep++;
> + addr += pgsize;
> + tmp_pte = __ptep_get_and_clear(mm, addr, ptep);
> + if (present) {
> + if (pte_dirty(tmp_pte))
> + pte = pte_mkdirty(pte);
> + if (pte_young(tmp_pte))
> + pte = pte_mkyoung(pte);
> + }
> }
> - return orig_pte;
> + return pte;
> }
>
> static pte_t get_clear_contig_flush(struct mm_struct *mm,
> @@ -401,13 +400,8 @@ pte_t huge_ptep_get_and_clear(struct mm_struct *mm, unsigned long addr,
> {
> int ncontig;
> size_t pgsize;
> - pte_t orig_pte = __ptep_get(ptep);
> -
> - if (!pte_cont(orig_pte))
> - return __ptep_get_and_clear(mm, addr, ptep);
> -
> - ncontig = find_num_contig(mm, addr, ptep, &pgsize);
>
> + ncontig = num_contig_ptes(sz, &pgsize);
> return get_clear_contig(mm, addr, ptep, pgsize, ncontig);
> }
>
> --
> 2.43.0
>
Thanks for ccing me on this fix, we have the same issue (in
huge_pte_clear() too) in riscv. I'll come up with a fix and wait for
you guys to merge this.
That only comforts my idea to merge both riscv and arm64 implementations :)
Thanks again,
Alex
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v2 2/4] arm64: hugetlb: Fix huge_ptep_get_and_clear() for non-present ptes
2025-02-21 15:31 ` Will Deacon
@ 2025-02-24 12:11 ` Ryan Roberts
2025-02-25 22:18 ` Will Deacon
0 siblings, 1 reply; 30+ messages in thread
From: Ryan Roberts @ 2025-02-24 12:11 UTC (permalink / raw)
To: Will Deacon
Cc: Catalin Marinas, Huacai Chen, WANG Xuerui, Thomas Bogendoerfer,
James E.J. Bottomley, Helge Deller, Madhavan Srinivasan,
Michael Ellerman, Nicholas Piggin, Christophe Leroy,
Naveen N Rao, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, Gerald Schaefer,
David S. Miller, Andreas Larsson, Arnd Bergmann, Muchun Song,
Andrew Morton, Uladzislau Rezki, Christoph Hellwig,
David Hildenbrand, Matthew Wilcox (Oracle),
Mark Rutland, Anshuman Khandual, Dev Jain, Kevin Brodsky,
Alexandre Ghiti, linux-arm-kernel, linux-mm, linux-kernel,
stable
On 21/02/2025 15:31, Will Deacon wrote:
> On Mon, Feb 17, 2025 at 02:04:15PM +0000, Ryan Roberts wrote:
>> arm64 supports multiple huge_pte sizes. Some of the sizes are covered by
>> a single pte entry at a particular level (PMD_SIZE, PUD_SIZE), and some
>> are covered by multiple ptes at a particular level (CONT_PTE_SIZE,
>> CONT_PMD_SIZE). So the function has to figure out the size from the
>> huge_pte pointer. This was previously done by walking the pgtable to
>> determine the level and by using the PTE_CONT bit to determine the
>> number of ptes at the level.
>>
>> But the PTE_CONT bit is only valid when the pte is present. For
>> non-present pte values (e.g. markers, migration entries), the previous
>> implementation was therefore erroniously determining the size. There is
>> at least one known caller in core-mm, move_huge_pte(), which may call
>> huge_ptep_get_and_clear() for a non-present pte. So we must be robust to
>> this case. Additionally the "regular" ptep_get_and_clear() is robust to
>> being called for non-present ptes so it makes sense to follow the
>> behaviour.
>>
>> Fix this by using the new sz parameter which is now provided to the
>> function. Additionally when clearing each pte in a contig range, don't
>> gather the access and dirty bits if the pte is not present.
>>
>> An alternative approach that would not require API changes would be to
>> store the PTE_CONT bit in a spare bit in the swap entry pte for the
>> non-present case. But it felt cleaner to follow other APIs' lead and
>> just pass in the size.
>>
>> As an aside, PTE_CONT is bit 52, which corresponds to bit 40 in the swap
>> entry offset field (layout of non-present pte). Since hugetlb is never
>> swapped to disk, this field will only be populated for markers, which
>> always set this bit to 0 and hwpoison swap entries, which set the offset
>> field to a PFN; So it would only ever be 1 for a 52-bit PVA system where
>> memory in that high half was poisoned (I think!). So in practice, this
>> bit would almost always be zero for non-present ptes and we would only
>> clear the first entry if it was actually a contiguous block. That's
>> probably a less severe symptom than if it was always interpretted as 1
>> and cleared out potentially-present neighboring PTEs.
>>
>> Cc: stable@vger.kernel.org
>> Fixes: 66b3923a1a0f ("arm64: hugetlb: add support for PTE contiguous bit")
>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>> ---
>> arch/arm64/mm/hugetlbpage.c | 40 ++++++++++++++++---------------------
>> 1 file changed, 17 insertions(+), 23 deletions(-)
>>
>> diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
>> index 06db4649af91..614b2feddba2 100644
>> --- a/arch/arm64/mm/hugetlbpage.c
>> +++ b/arch/arm64/mm/hugetlbpage.c
>> @@ -163,24 +163,23 @@ static pte_t get_clear_contig(struct mm_struct *mm,
>> unsigned long pgsize,
>> unsigned long ncontig)
>> {
>> - pte_t orig_pte = __ptep_get(ptep);
>> - unsigned long i;
>> -
>> - for (i = 0; i < ncontig; i++, addr += pgsize, ptep++) {
>> - pte_t pte = __ptep_get_and_clear(mm, addr, ptep);
>> -
>> - /*
>> - * If HW_AFDBM is enabled, then the HW could turn on
>> - * the dirty or accessed bit for any page in the set,
>> - * so check them all.
>> - */
>> - if (pte_dirty(pte))
>> - orig_pte = pte_mkdirty(orig_pte);
>> -
>> - if (pte_young(pte))
>> - orig_pte = pte_mkyoung(orig_pte);
>> + pte_t pte, tmp_pte;
>> + bool present;
>> +
>> + pte = __ptep_get_and_clear(mm, addr, ptep);
>> + present = pte_present(pte);
>> + while (--ncontig) {
>> + ptep++;
>> + addr += pgsize;
>> + tmp_pte = __ptep_get_and_clear(mm, addr, ptep);
>> + if (present) {
>> + if (pte_dirty(tmp_pte))
>> + pte = pte_mkdirty(pte);
>> + if (pte_young(tmp_pte))
>> + pte = pte_mkyoung(pte);
>> + }
>> }
>
> nit: With the loop now structured like this, we really can't handle
> num_contig_ptes() returning 0 if it gets an unknown size. Granted, that
> really shouldn't happen, but perhaps it would be better to add a 'default'
> case with a WARN() to num_contig_ptes() and then add an early return here?
Looking at other users of num_contig_ptes() it looks like huge_ptep_get()
already assumes at least 1 pte (it calls __ptep_get() before calling
num_contig_ptes()) and set_huge_pte_at() assumes 1 pte for the "present and
non-contig" case. So num_contig_ptes() returning 0 is already not really
consumed consistently.
How about we change the default num_contig_ptes() return value to 1 and add a
warning if size is invalid:
---8<---
diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index 614b2feddba2..b3a7fafe8892 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -100,20 +100,11 @@ static int find_num_contig(struct mm_struct *mm, unsigned
long addr,
static inline int num_contig_ptes(unsigned long size, size_t *pgsize)
{
- int contig_ptes = 0;
+ int contig_ptes = 1;
*pgsize = size;
switch (size) {
-#ifndef __PAGETABLE_PMD_FOLDED
- case PUD_SIZE:
- if (pud_sect_supported())
- contig_ptes = 1;
- break;
-#endif
- case PMD_SIZE:
- contig_ptes = 1;
- break;
case CONT_PMD_SIZE:
*pgsize = PMD_SIZE;
contig_ptes = CONT_PMDS;
@@ -122,6 +113,8 @@ static inline int num_contig_ptes(unsigned long size, size_t
*pgsize)
*pgsize = PAGE_SIZE;
contig_ptes = CONT_PTES;
break;
+ default:
+ WARN_ON(!__hugetlb_valid_size(size));
}
return contig_ptes;
---8<---
>
> Will
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v2 0/4] Fixes for hugetlb and vmalloc on arm64
2025-02-21 15:35 ` Will Deacon
@ 2025-02-24 12:13 ` Ryan Roberts
0 siblings, 0 replies; 30+ messages in thread
From: Ryan Roberts @ 2025-02-24 12:13 UTC (permalink / raw)
To: Will Deacon, Catalin Marinas
Cc: Huacai Chen, WANG Xuerui, Thomas Bogendoerfer,
James E.J. Bottomley, Helge Deller, Madhavan Srinivasan,
Michael Ellerman, Nicholas Piggin, Christophe Leroy,
Naveen N Rao, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, Gerald Schaefer,
David S. Miller, Andreas Larsson, Arnd Bergmann, Muchun Song,
Andrew Morton, Uladzislau Rezki, Christoph Hellwig,
David Hildenbrand, Matthew Wilcox (Oracle),
Mark Rutland, Anshuman Khandual, Dev Jain, Kevin Brodsky,
Alexandre Ghiti, linux-arm-kernel, linux-mm, linux-kernel
On 21/02/2025 15:35, Will Deacon wrote:
> On Wed, Feb 19, 2025 at 07:10:32PM +0000, Catalin Marinas wrote:
>> Will, Ryan,
>>
>> On Mon, Feb 17, 2025 at 02:04:13PM +0000, Ryan Roberts wrote:
>>> This series contains some fixes for hugetlb on arm64, and is split out from v1
>>> of a wider series at [1]. While the last patch is technically targetting core-mm
>>> and is not directly related to arm64, I'd like to to go via the arm64 tree so
>>> that the wider performance improvement series (v2 to be posted shortly) that
>>> depends on this series doesn't have to be robust to the fix not being present.
>>>
>>> I've included maintainers/reviewers for all the arches that are (trivially)
>>> touched due to the API changes, hoping for some ACKs.
>>
>> These fixes look fine to me and I think we should get them in for 6.14.
>> I think Andrew was ok with them going in via the arm64 tree:
>>
>> https://lore.kernel.org/linux-arm-kernel/20250205235219.3c3a4b968087d1386d708b04@linux-foundation.org/
>
> I think the diffstat looks worse than it really is, as the arch changes
> are reasonably mechanical. I'd like to let it sit in next for a few days
> though, so I'll pick this up once we've resolved my comment on patch #2.
Sounds good to me; if you're happy with my proposal in the patch 2 thread, then
I'll respin.
>
> Ryan -- did you find all of these issues by inspection, or are you aware
> of anybody hitting them in practice?
All by inspection. So I guess they are not urgent to fix from that perspective.
>
> Will
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v2 1/4] mm: hugetlb: Add huge page size param to huge_ptep_get_and_clear()
2025-02-17 14:04 ` [PATCH v2 1/4] mm: hugetlb: Add huge page size param to huge_ptep_get_and_clear() Ryan Roberts
` (2 preceding siblings ...)
2025-02-20 9:46 ` David Hildenbrand
@ 2025-02-25 14:25 ` Alexander Gordeev
2025-02-25 15:43 ` Ryan Roberts
2025-02-26 7:37 ` Christophe Leroy
4 siblings, 1 reply; 30+ messages in thread
From: Alexander Gordeev @ 2025-02-25 14:25 UTC (permalink / raw)
To: Ryan Roberts
Cc: Catalin Marinas, Will Deacon, Huacai Chen, WANG Xuerui,
Thomas Bogendoerfer, James E.J. Bottomley, Helge Deller,
Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
Christophe Leroy, Naveen N Rao, Paul Walmsley, Palmer Dabbelt,
Albert Ou, Heiko Carstens, Vasily Gorbik, Christian Borntraeger,
Sven Schnelle, Gerald Schaefer, David S. Miller, Andreas Larsson,
Arnd Bergmann, Muchun Song, Andrew Morton, Uladzislau Rezki,
Christoph Hellwig, David Hildenbrand, Matthew Wilcox (Oracle),
Mark Rutland, Anshuman Khandual, Dev Jain, Kevin Brodsky,
Alexandre Ghiti, linux-arm-kernel, linux-mm, linux-kernel,
stable
On Mon, Feb 17, 2025 at 02:04:14PM +0000, Ryan Roberts wrote:
Hi Ryan,
> In order to fix a bug, arm64 needs to be told the size of the huge page
> for which the huge_pte is being set in huge_ptep_get_and_clear().
> Provide for this by adding an `unsigned long sz` parameter to the
> function. This follows the same pattern as huge_pte_clear() and
> set_huge_pte_at().
>
> This commit makes the required interface modifications to the core mm as
> well as all arches that implement this function (arm64, loongarch, mips,
> parisc, powerpc, riscv, s390, sparc). The actual arm64 bug will be fixed
> in a separate commit.
>
> Cc: stable@vger.kernel.org
> Fixes: 66b3923a1a0f ("arm64: hugetlb: add support for PTE contiguous bit")
> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> ---
...
> diff --git a/arch/s390/include/asm/hugetlb.h b/arch/s390/include/asm/hugetlb.h
> index 7c52acaf9f82..420c74306779 100644
> --- a/arch/s390/include/asm/hugetlb.h
> +++ b/arch/s390/include/asm/hugetlb.h
> @@ -26,7 +26,11 @@ void __set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
> pte_t huge_ptep_get(struct mm_struct *mm, unsigned long addr, pte_t *ptep);
>
> #define __HAVE_ARCH_HUGE_PTEP_GET_AND_CLEAR
> -pte_t huge_ptep_get_and_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep);
> +pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
> + unsigned long addr, pte_t *ptep,
> + unsigned long sz);
Please, format parameters similarily to set_huge_pte_at() few lines above.
> +pte_t __huge_ptep_get_and_clear(struct mm_struct *mm,
> + unsigned long addr, pte_t *ptep);
The formatting is broken, but please see below.
> static inline void arch_clear_hugetlb_flags(struct folio *folio)
> {
> @@ -48,7 +52,7 @@ static inline void huge_pte_clear(struct mm_struct *mm, unsigned long addr,
> static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
> unsigned long address, pte_t *ptep)
> {
> - return huge_ptep_get_and_clear(vma->vm_mm, address, ptep);
> + return __huge_ptep_get_and_clear(vma->vm_mm, address, ptep);
> }
>
> #define __HAVE_ARCH_HUGE_PTEP_SET_ACCESS_FLAGS
> @@ -59,7 +63,7 @@ static inline int huge_ptep_set_access_flags(struct vm_area_struct *vma,
> int changed = !pte_same(huge_ptep_get(vma->vm_mm, addr, ptep), pte);
>
> if (changed) {
> - huge_ptep_get_and_clear(vma->vm_mm, addr, ptep);
> + __huge_ptep_get_and_clear(vma->vm_mm, addr, ptep);
> __set_huge_pte_at(vma->vm_mm, addr, ptep, pte);
> }
> return changed;
> @@ -69,7 +73,7 @@ static inline int huge_ptep_set_access_flags(struct vm_area_struct *vma,
> static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,
> unsigned long addr, pte_t *ptep)
> {
> - pte_t pte = huge_ptep_get_and_clear(mm, addr, ptep);
> + pte_t pte = __huge_ptep_get_and_clear(mm, addr, ptep);
>
> __set_huge_pte_at(mm, addr, ptep, pte_wrprotect(pte));
> }
> diff --git a/arch/s390/mm/hugetlbpage.c b/arch/s390/mm/hugetlbpage.c
> index d9ce199953de..52ee8e854195 100644
> --- a/arch/s390/mm/hugetlbpage.c
> +++ b/arch/s390/mm/hugetlbpage.c
> @@ -188,8 +188,8 @@ pte_t huge_ptep_get(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
> return __rste_to_pte(pte_val(*ptep));
> }
>
> -pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
> - unsigned long addr, pte_t *ptep)
> +pte_t __huge_ptep_get_and_clear(struct mm_struct *mm,
> + unsigned long addr, pte_t *ptep)
> {
> pte_t pte = huge_ptep_get(mm, addr, ptep);
> pmd_t *pmdp = (pmd_t *) ptep;
> @@ -202,6 +202,12 @@ pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
> return pte;
> }
>
> +pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
> + unsigned long addr, pte_t *ptep, unsigned long sz)
> +{
> + return __huge_ptep_get_and_clear(mm, addr, ptep);
> +}
Is there a reason why this is not a header inline, as other callers of
__huge_ptep_get_and_clear()?
> pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma,
> unsigned long addr, unsigned long sz)
> {
...
Thanks!
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v2 1/4] mm: hugetlb: Add huge page size param to huge_ptep_get_and_clear()
2025-02-25 14:25 ` Alexander Gordeev
@ 2025-02-25 15:43 ` Ryan Roberts
2025-02-26 7:16 ` Alexander Gordeev
0 siblings, 1 reply; 30+ messages in thread
From: Ryan Roberts @ 2025-02-25 15:43 UTC (permalink / raw)
To: Alexander Gordeev
Cc: Catalin Marinas, Will Deacon, Huacai Chen, WANG Xuerui,
Thomas Bogendoerfer, James E.J. Bottomley, Helge Deller,
Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
Christophe Leroy, Naveen N Rao, Paul Walmsley, Palmer Dabbelt,
Albert Ou, Heiko Carstens, Vasily Gorbik, Christian Borntraeger,
Sven Schnelle, Gerald Schaefer, David S. Miller, Andreas Larsson,
Arnd Bergmann, Muchun Song, Andrew Morton, Uladzislau Rezki,
Christoph Hellwig, David Hildenbrand, Matthew Wilcox (Oracle),
Mark Rutland, Anshuman Khandual, Dev Jain, Kevin Brodsky,
Alexandre Ghiti, linux-arm-kernel, linux-mm, linux-kernel,
stable
On 25/02/2025 14:25, Alexander Gordeev wrote:
> On Mon, Feb 17, 2025 at 02:04:14PM +0000, Ryan Roberts wrote:
>
> Hi Ryan,
>
>> In order to fix a bug, arm64 needs to be told the size of the huge page
>> for which the huge_pte is being set in huge_ptep_get_and_clear().
>> Provide for this by adding an `unsigned long sz` parameter to the
>> function. This follows the same pattern as huge_pte_clear() and
>> set_huge_pte_at().
>>
>> This commit makes the required interface modifications to the core mm as
>> well as all arches that implement this function (arm64, loongarch, mips,
>> parisc, powerpc, riscv, s390, sparc). The actual arm64 bug will be fixed
>> in a separate commit.
>>
>> Cc: stable@vger.kernel.org
>> Fixes: 66b3923a1a0f ("arm64: hugetlb: add support for PTE contiguous bit")
>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>> ---
> ...
>> diff --git a/arch/s390/include/asm/hugetlb.h b/arch/s390/include/asm/hugetlb.h
>> index 7c52acaf9f82..420c74306779 100644
>> --- a/arch/s390/include/asm/hugetlb.h
>> +++ b/arch/s390/include/asm/hugetlb.h
>> @@ -26,7 +26,11 @@ void __set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
>> pte_t huge_ptep_get(struct mm_struct *mm, unsigned long addr, pte_t *ptep);
>>
>> #define __HAVE_ARCH_HUGE_PTEP_GET_AND_CLEAR
>> -pte_t huge_ptep_get_and_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep);
>> +pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
>> + unsigned long addr, pte_t *ptep,
>> + unsigned long sz);
>
> Please, format parameters similarily to set_huge_pte_at() few lines above.
Appologies. I've fixed this for the next version.
>
>> +pte_t __huge_ptep_get_and_clear(struct mm_struct *mm,
>> + unsigned long addr, pte_t *ptep);
>
> The formatting is broken, but please see below.
Formatting fixed here too.
>
>> static inline void arch_clear_hugetlb_flags(struct folio *folio)
>> {
>> @@ -48,7 +52,7 @@ static inline void huge_pte_clear(struct mm_struct *mm, unsigned long addr,
>> static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
>> unsigned long address, pte_t *ptep)
>> {
>> - return huge_ptep_get_and_clear(vma->vm_mm, address, ptep);
>> + return __huge_ptep_get_and_clear(vma->vm_mm, address, ptep);
>> }
>>
>> #define __HAVE_ARCH_HUGE_PTEP_SET_ACCESS_FLAGS
>> @@ -59,7 +63,7 @@ static inline int huge_ptep_set_access_flags(struct vm_area_struct *vma,
>> int changed = !pte_same(huge_ptep_get(vma->vm_mm, addr, ptep), pte);
>>
>> if (changed) {
>> - huge_ptep_get_and_clear(vma->vm_mm, addr, ptep);
>> + __huge_ptep_get_and_clear(vma->vm_mm, addr, ptep);
>> __set_huge_pte_at(vma->vm_mm, addr, ptep, pte);
>> }
>> return changed;
>> @@ -69,7 +73,7 @@ static inline int huge_ptep_set_access_flags(struct vm_area_struct *vma,
>> static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,
>> unsigned long addr, pte_t *ptep)
>> {
>> - pte_t pte = huge_ptep_get_and_clear(mm, addr, ptep);
>> + pte_t pte = __huge_ptep_get_and_clear(mm, addr, ptep);
>>
>> __set_huge_pte_at(mm, addr, ptep, pte_wrprotect(pte));
>> }
>> diff --git a/arch/s390/mm/hugetlbpage.c b/arch/s390/mm/hugetlbpage.c
>> index d9ce199953de..52ee8e854195 100644
>> --- a/arch/s390/mm/hugetlbpage.c
>> +++ b/arch/s390/mm/hugetlbpage.c
>> @@ -188,8 +188,8 @@ pte_t huge_ptep_get(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
>> return __rste_to_pte(pte_val(*ptep));
>> }
>>
>> -pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
>> - unsigned long addr, pte_t *ptep)
>> +pte_t __huge_ptep_get_and_clear(struct mm_struct *mm,
>> + unsigned long addr, pte_t *ptep)
>> {
>> pte_t pte = huge_ptep_get(mm, addr, ptep);
>> pmd_t *pmdp = (pmd_t *) ptep;
>> @@ -202,6 +202,12 @@ pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
>> return pte;
>> }
>>
>> +pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
>> + unsigned long addr, pte_t *ptep, unsigned long sz)
>> +{
>> + return __huge_ptep_get_and_clear(mm, addr, ptep);
>> +}
>
> Is there a reason why this is not a header inline, as other callers of
> __huge_ptep_get_and_clear()?
I was trying to make the change as uninvasive as possible, so didn't want to
change the linkage in case I accidentally broke something. Happy to make this an
inline in the header though, if you prefer?
Thanks,
Ryan
>
>> pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma,
>> unsigned long addr, unsigned long sz)
>> {
> ...
>
> Thanks!
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v2 2/4] arm64: hugetlb: Fix huge_ptep_get_and_clear() for non-present ptes
2025-02-24 12:11 ` Ryan Roberts
@ 2025-02-25 22:18 ` Will Deacon
2025-02-26 8:09 ` Ryan Roberts
0 siblings, 1 reply; 30+ messages in thread
From: Will Deacon @ 2025-02-25 22:18 UTC (permalink / raw)
To: Ryan Roberts
Cc: Catalin Marinas, Huacai Chen, WANG Xuerui, Thomas Bogendoerfer,
James E.J. Bottomley, Helge Deller, Madhavan Srinivasan,
Michael Ellerman, Nicholas Piggin, Christophe Leroy,
Naveen N Rao, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, Gerald Schaefer,
David S. Miller, Andreas Larsson, Arnd Bergmann, Muchun Song,
Andrew Morton, Uladzislau Rezki, Christoph Hellwig,
David Hildenbrand, Matthew Wilcox (Oracle),
Mark Rutland, Anshuman Khandual, Dev Jain, Kevin Brodsky,
Alexandre Ghiti, linux-arm-kernel, linux-mm, linux-kernel,
stable
On Mon, Feb 24, 2025 at 12:11:19PM +0000, Ryan Roberts wrote:
> On 21/02/2025 15:31, Will Deacon wrote:
> > On Mon, Feb 17, 2025 at 02:04:15PM +0000, Ryan Roberts wrote:
> >> + pte = __ptep_get_and_clear(mm, addr, ptep);
> >> + present = pte_present(pte);
> >> + while (--ncontig) {
> >> + ptep++;
> >> + addr += pgsize;
> >> + tmp_pte = __ptep_get_and_clear(mm, addr, ptep);
> >> + if (present) {
> >> + if (pte_dirty(tmp_pte))
> >> + pte = pte_mkdirty(pte);
> >> + if (pte_young(tmp_pte))
> >> + pte = pte_mkyoung(pte);
> >> + }
> >> }
> >
> > nit: With the loop now structured like this, we really can't handle
> > num_contig_ptes() returning 0 if it gets an unknown size. Granted, that
> > really shouldn't happen, but perhaps it would be better to add a 'default'
> > case with a WARN() to num_contig_ptes() and then add an early return here?
>
> Looking at other users of num_contig_ptes() it looks like huge_ptep_get()
> already assumes at least 1 pte (it calls __ptep_get() before calling
> num_contig_ptes()) and set_huge_pte_at() assumes 1 pte for the "present and
> non-contig" case. So num_contig_ptes() returning 0 is already not really
> consumed consistently.
>
> How about we change the default num_contig_ptes() return value to 1 and add a
> warning if size is invalid:
Fine by me!
I assume you'll fold that in and send a new version, along with the typo
fixes?
Cheers,
Will
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v2 1/4] mm: hugetlb: Add huge page size param to huge_ptep_get_and_clear()
2025-02-25 15:43 ` Ryan Roberts
@ 2025-02-26 7:16 ` Alexander Gordeev
0 siblings, 0 replies; 30+ messages in thread
From: Alexander Gordeev @ 2025-02-26 7:16 UTC (permalink / raw)
To: Ryan Roberts
Cc: Catalin Marinas, Will Deacon, Huacai Chen, WANG Xuerui,
Thomas Bogendoerfer, James E.J. Bottomley, Helge Deller,
Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
Christophe Leroy, Naveen N Rao, Paul Walmsley, Palmer Dabbelt,
Albert Ou, Heiko Carstens, Vasily Gorbik, Christian Borntraeger,
Sven Schnelle, Gerald Schaefer, David S. Miller, Andreas Larsson,
Arnd Bergmann, Muchun Song, Andrew Morton, Uladzislau Rezki,
Christoph Hellwig, David Hildenbrand, Matthew Wilcox (Oracle),
Mark Rutland, Anshuman Khandual, Dev Jain, Kevin Brodsky,
Alexandre Ghiti, linux-arm-kernel, linux-mm, linux-kernel,
stable
On Tue, Feb 25, 2025 at 03:43:04PM +0000, Ryan Roberts wrote:
> >> +pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
> >> + unsigned long addr, pte_t *ptep, unsigned long sz)
> >> +{
> >> + return __huge_ptep_get_and_clear(mm, addr, ptep);
> >> +}
> >
> > Is there a reason why this is not a header inline, as other callers of
> > __huge_ptep_get_and_clear()?
>
> I was trying to make the change as uninvasive as possible, so didn't want to
> change the linkage in case I accidentally broke something. Happy to make this an
> inline in the header though, if you prefer?
Yes, please.
> Thanks,
> Ryan
Thanks!
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v2 1/4] mm: hugetlb: Add huge page size param to huge_ptep_get_and_clear()
2025-02-17 14:04 ` [PATCH v2 1/4] mm: hugetlb: Add huge page size param to huge_ptep_get_and_clear() Ryan Roberts
` (3 preceding siblings ...)
2025-02-25 14:25 ` Alexander Gordeev
@ 2025-02-26 7:37 ` Christophe Leroy
4 siblings, 0 replies; 30+ messages in thread
From: Christophe Leroy @ 2025-02-26 7:37 UTC (permalink / raw)
To: Ryan Roberts, Catalin Marinas, Will Deacon, Huacai Chen,
WANG Xuerui, Thomas Bogendoerfer, James E.J. Bottomley,
Helge Deller, Madhavan Srinivasan, Michael Ellerman,
Nicholas Piggin, Naveen N Rao, Paul Walmsley, Palmer Dabbelt,
Albert Ou, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, Gerald Schaefer,
David S. Miller, Andreas Larsson, Arnd Bergmann, Muchun Song,
Andrew Morton, Uladzislau Rezki, Christoph Hellwig,
David Hildenbrand, Matthew Wilcox (Oracle),
Mark Rutland, Anshuman Khandual, Dev Jain, Kevin Brodsky,
Alexandre Ghiti
Cc: linux-arm-kernel, linux-mm, linux-kernel, stable
Le 17/02/2025 à 15:04, Ryan Roberts a écrit :
> In order to fix a bug, arm64 needs to be told the size of the huge page
> for which the huge_pte is being set in huge_ptep_get_and_clear().
> Provide for this by adding an `unsigned long sz` parameter to the
> function. This follows the same pattern as huge_pte_clear() and
> set_huge_pte_at().
>
> This commit makes the required interface modifications to the core mm as
> well as all arches that implement this function (arm64, loongarch, mips,
> parisc, powerpc, riscv, s390, sparc). The actual arm64 bug will be fixed
> in a separate commit.
>
> Cc: stable@vger.kernel.org
> Fixes: 66b3923a1a0f ("arm64: hugetlb: add support for PTE contiguous bit")
> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
I like the idea of adding the page size.
Once page size is added to both huge_ptep_get_and_clear() and
huge_ptep_set_wrprotect(), it will be possible to remove the pagetable
walk in powerpc 8xx pte_update() that is required at the moment to know
if ptep is a 8M cont-PMD entry or a cont-PTE entry for a 512k or 16k page.
> ---
> arch/arm64/include/asm/hugetlb.h | 4 ++--
> arch/arm64/mm/hugetlbpage.c | 8 +++++---
> arch/loongarch/include/asm/hugetlb.h | 6 ++++--
> arch/mips/include/asm/hugetlb.h | 6 ++++--
> arch/parisc/include/asm/hugetlb.h | 2 +-
> arch/parisc/mm/hugetlbpage.c | 2 +-
> arch/powerpc/include/asm/hugetlb.h | 6 ++++--
> arch/riscv/include/asm/hugetlb.h | 3 ++-
> arch/riscv/mm/hugetlbpage.c | 2 +-
> arch/s390/include/asm/hugetlb.h | 12 ++++++++----
> arch/s390/mm/hugetlbpage.c | 10 ++++++++--
> arch/sparc/include/asm/hugetlb.h | 2 +-
> arch/sparc/mm/hugetlbpage.c | 2 +-
> include/asm-generic/hugetlb.h | 2 +-
> include/linux/hugetlb.h | 4 +++-
> mm/hugetlb.c | 4 ++--
> 16 files changed, 48 insertions(+), 27 deletions(-)
>
> diff --git a/arch/arm64/include/asm/hugetlb.h b/arch/arm64/include/asm/hugetlb.h
> index c6dff3e69539..03db9cb21ace 100644
> --- a/arch/arm64/include/asm/hugetlb.h
> +++ b/arch/arm64/include/asm/hugetlb.h
> @@ -42,8 +42,8 @@ extern int huge_ptep_set_access_flags(struct vm_area_struct *vma,
> unsigned long addr, pte_t *ptep,
> pte_t pte, int dirty);
> #define __HAVE_ARCH_HUGE_PTEP_GET_AND_CLEAR
> -extern pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
> - unsigned long addr, pte_t *ptep);
> +extern pte_t huge_ptep_get_and_clear(struct mm_struct *mm, unsigned long addr,
> + pte_t *ptep, unsigned long sz);
> #define __HAVE_ARCH_HUGE_PTEP_SET_WRPROTECT
> extern void huge_ptep_set_wrprotect(struct mm_struct *mm,
> unsigned long addr, pte_t *ptep);
> diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
> index 98a2a0e64e25..06db4649af91 100644
> --- a/arch/arm64/mm/hugetlbpage.c
> +++ b/arch/arm64/mm/hugetlbpage.c
> @@ -396,8 +396,8 @@ void huge_pte_clear(struct mm_struct *mm, unsigned long addr,
> __pte_clear(mm, addr, ptep);
> }
>
> -pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
> - unsigned long addr, pte_t *ptep)
> +pte_t huge_ptep_get_and_clear(struct mm_struct *mm, unsigned long addr,
> + pte_t *ptep, unsigned long sz)
> {
> int ncontig;
> size_t pgsize;
> @@ -549,6 +549,8 @@ bool __init arch_hugetlb_valid_size(unsigned long size)
>
> pte_t huge_ptep_modify_prot_start(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep)
> {
> + unsigned long psize = huge_page_size(hstate_vma(vma));
> +
> if (alternative_has_cap_unlikely(ARM64_WORKAROUND_2645198)) {
> /*
> * Break-before-make (BBM) is required for all user space mappings
> @@ -558,7 +560,7 @@ pte_t huge_ptep_modify_prot_start(struct vm_area_struct *vma, unsigned long addr
> if (pte_user_exec(__ptep_get(ptep)))
> return huge_ptep_clear_flush(vma, addr, ptep);
> }
> - return huge_ptep_get_and_clear(vma->vm_mm, addr, ptep);
> + return huge_ptep_get_and_clear(vma->vm_mm, addr, ptep, psize);
> }
>
> void huge_ptep_modify_prot_commit(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep,
> diff --git a/arch/loongarch/include/asm/hugetlb.h b/arch/loongarch/include/asm/hugetlb.h
> index c8e4057734d0..4dc4b3e04225 100644
> --- a/arch/loongarch/include/asm/hugetlb.h
> +++ b/arch/loongarch/include/asm/hugetlb.h
> @@ -36,7 +36,8 @@ static inline void huge_pte_clear(struct mm_struct *mm, unsigned long addr,
>
> #define __HAVE_ARCH_HUGE_PTEP_GET_AND_CLEAR
> static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
> - unsigned long addr, pte_t *ptep)
> + unsigned long addr, pte_t *ptep,
> + unsigned long sz)
> {
> pte_t clear;
> pte_t pte = ptep_get(ptep);
> @@ -51,8 +52,9 @@ static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
> unsigned long addr, pte_t *ptep)
> {
> pte_t pte;
> + unsigned long sz = huge_page_size(hstate_vma(vma));
>
> - pte = huge_ptep_get_and_clear(vma->vm_mm, addr, ptep);
> + pte = huge_ptep_get_and_clear(vma->vm_mm, addr, ptep, sz);
> flush_tlb_page(vma, addr);
> return pte;
> }
> diff --git a/arch/mips/include/asm/hugetlb.h b/arch/mips/include/asm/hugetlb.h
> index d0a86ce83de9..fbc71ddcf0f6 100644
> --- a/arch/mips/include/asm/hugetlb.h
> +++ b/arch/mips/include/asm/hugetlb.h
> @@ -27,7 +27,8 @@ static inline int prepare_hugepage_range(struct file *file,
>
> #define __HAVE_ARCH_HUGE_PTEP_GET_AND_CLEAR
> static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
> - unsigned long addr, pte_t *ptep)
> + unsigned long addr, pte_t *ptep,
> + unsigned long sz)
> {
> pte_t clear;
> pte_t pte = *ptep;
> @@ -42,13 +43,14 @@ static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
> unsigned long addr, pte_t *ptep)
> {
> pte_t pte;
> + unsigned long sz = huge_page_size(hstate_vma(vma));
>
> /*
> * clear the huge pte entry firstly, so that the other smp threads will
> * not get old pte entry after finishing flush_tlb_page and before
> * setting new huge pte entry
> */
> - pte = huge_ptep_get_and_clear(vma->vm_mm, addr, ptep);
> + pte = huge_ptep_get_and_clear(vma->vm_mm, addr, ptep, sz);
> flush_tlb_page(vma, addr);
> return pte;
> }
> diff --git a/arch/parisc/include/asm/hugetlb.h b/arch/parisc/include/asm/hugetlb.h
> index 5b3a5429f71b..21e9ace17739 100644
> --- a/arch/parisc/include/asm/hugetlb.h
> +++ b/arch/parisc/include/asm/hugetlb.h
> @@ -10,7 +10,7 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
>
> #define __HAVE_ARCH_HUGE_PTEP_GET_AND_CLEAR
> pte_t huge_ptep_get_and_clear(struct mm_struct *mm, unsigned long addr,
> - pte_t *ptep);
> + pte_t *ptep, unsigned long sz);
>
> #define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH
> static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
> diff --git a/arch/parisc/mm/hugetlbpage.c b/arch/parisc/mm/hugetlbpage.c
> index e9d18cf25b79..a94fe546d434 100644
> --- a/arch/parisc/mm/hugetlbpage.c
> +++ b/arch/parisc/mm/hugetlbpage.c
> @@ -126,7 +126,7 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
>
>
> pte_t huge_ptep_get_and_clear(struct mm_struct *mm, unsigned long addr,
> - pte_t *ptep)
> + pte_t *ptep, unsigned long sz)
> {
> pte_t entry;
>
> diff --git a/arch/powerpc/include/asm/hugetlb.h b/arch/powerpc/include/asm/hugetlb.h
> index dad2e7980f24..86326587e58d 100644
> --- a/arch/powerpc/include/asm/hugetlb.h
> +++ b/arch/powerpc/include/asm/hugetlb.h
> @@ -45,7 +45,8 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
>
> #define __HAVE_ARCH_HUGE_PTEP_GET_AND_CLEAR
> static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
> - unsigned long addr, pte_t *ptep)
> + unsigned long addr, pte_t *ptep,
> + unsigned long sz)
> {
> return __pte(pte_update(mm, addr, ptep, ~0UL, 0, 1));
> }
> @@ -55,8 +56,9 @@ static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
> unsigned long addr, pte_t *ptep)
> {
> pte_t pte;
> + unsigned long sz = huge_page_size(hstate_vma(vma));
>
> - pte = huge_ptep_get_and_clear(vma->vm_mm, addr, ptep);
> + pte = huge_ptep_get_and_clear(vma->vm_mm, addr, ptep, sz);
> flush_hugetlb_page(vma, addr);
> return pte;
> }
> diff --git a/arch/riscv/include/asm/hugetlb.h b/arch/riscv/include/asm/hugetlb.h
> index faf3624d8057..446126497768 100644
> --- a/arch/riscv/include/asm/hugetlb.h
> +++ b/arch/riscv/include/asm/hugetlb.h
> @@ -28,7 +28,8 @@ void set_huge_pte_at(struct mm_struct *mm,
>
> #define __HAVE_ARCH_HUGE_PTEP_GET_AND_CLEAR
> pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
> - unsigned long addr, pte_t *ptep);
> + unsigned long addr, pte_t *ptep,
> + unsigned long sz);
>
> #define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH
> pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
> diff --git a/arch/riscv/mm/hugetlbpage.c b/arch/riscv/mm/hugetlbpage.c
> index 42314f093922..b4a78a4b35cf 100644
> --- a/arch/riscv/mm/hugetlbpage.c
> +++ b/arch/riscv/mm/hugetlbpage.c
> @@ -293,7 +293,7 @@ int huge_ptep_set_access_flags(struct vm_area_struct *vma,
>
> pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
> unsigned long addr,
> - pte_t *ptep)
> + pte_t *ptep, unsigned long sz)
> {
> pte_t orig_pte = ptep_get(ptep);
> int pte_num;
> diff --git a/arch/s390/include/asm/hugetlb.h b/arch/s390/include/asm/hugetlb.h
> index 7c52acaf9f82..420c74306779 100644
> --- a/arch/s390/include/asm/hugetlb.h
> +++ b/arch/s390/include/asm/hugetlb.h
> @@ -26,7 +26,11 @@ void __set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
> pte_t huge_ptep_get(struct mm_struct *mm, unsigned long addr, pte_t *ptep);
>
> #define __HAVE_ARCH_HUGE_PTEP_GET_AND_CLEAR
> -pte_t huge_ptep_get_and_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep);
> +pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
> + unsigned long addr, pte_t *ptep,
> + unsigned long sz);
> +pte_t __huge_ptep_get_and_clear(struct mm_struct *mm,
> + unsigned long addr, pte_t *ptep);
>
> static inline void arch_clear_hugetlb_flags(struct folio *folio)
> {
> @@ -48,7 +52,7 @@ static inline void huge_pte_clear(struct mm_struct *mm, unsigned long addr,
> static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
> unsigned long address, pte_t *ptep)
> {
> - return huge_ptep_get_and_clear(vma->vm_mm, address, ptep);
> + return __huge_ptep_get_and_clear(vma->vm_mm, address, ptep);
> }
>
> #define __HAVE_ARCH_HUGE_PTEP_SET_ACCESS_FLAGS
> @@ -59,7 +63,7 @@ static inline int huge_ptep_set_access_flags(struct vm_area_struct *vma,
> int changed = !pte_same(huge_ptep_get(vma->vm_mm, addr, ptep), pte);
>
> if (changed) {
> - huge_ptep_get_and_clear(vma->vm_mm, addr, ptep);
> + __huge_ptep_get_and_clear(vma->vm_mm, addr, ptep);
> __set_huge_pte_at(vma->vm_mm, addr, ptep, pte);
> }
> return changed;
> @@ -69,7 +73,7 @@ static inline int huge_ptep_set_access_flags(struct vm_area_struct *vma,
> static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,
> unsigned long addr, pte_t *ptep)
> {
> - pte_t pte = huge_ptep_get_and_clear(mm, addr, ptep);
> + pte_t pte = __huge_ptep_get_and_clear(mm, addr, ptep);
>
> __set_huge_pte_at(mm, addr, ptep, pte_wrprotect(pte));
> }
> diff --git a/arch/s390/mm/hugetlbpage.c b/arch/s390/mm/hugetlbpage.c
> index d9ce199953de..52ee8e854195 100644
> --- a/arch/s390/mm/hugetlbpage.c
> +++ b/arch/s390/mm/hugetlbpage.c
> @@ -188,8 +188,8 @@ pte_t huge_ptep_get(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
> return __rste_to_pte(pte_val(*ptep));
> }
>
> -pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
> - unsigned long addr, pte_t *ptep)
> +pte_t __huge_ptep_get_and_clear(struct mm_struct *mm,
> + unsigned long addr, pte_t *ptep)
> {
> pte_t pte = huge_ptep_get(mm, addr, ptep);
> pmd_t *pmdp = (pmd_t *) ptep;
> @@ -202,6 +202,12 @@ pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
> return pte;
> }
>
> +pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
> + unsigned long addr, pte_t *ptep, unsigned long sz)
> +{
> + return __huge_ptep_get_and_clear(mm, addr, ptep);
> +}
> +
> pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma,
> unsigned long addr, unsigned long sz)
> {
> diff --git a/arch/sparc/include/asm/hugetlb.h b/arch/sparc/include/asm/hugetlb.h
> index c714ca6a05aa..e7a9cdd498dc 100644
> --- a/arch/sparc/include/asm/hugetlb.h
> +++ b/arch/sparc/include/asm/hugetlb.h
> @@ -20,7 +20,7 @@ void __set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
>
> #define __HAVE_ARCH_HUGE_PTEP_GET_AND_CLEAR
> pte_t huge_ptep_get_and_clear(struct mm_struct *mm, unsigned long addr,
> - pte_t *ptep);
> + pte_t *ptep, unsigned long sz);
>
> #define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH
> static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
> diff --git a/arch/sparc/mm/hugetlbpage.c b/arch/sparc/mm/hugetlbpage.c
> index eee601a0d2cf..80504148d8a5 100644
> --- a/arch/sparc/mm/hugetlbpage.c
> +++ b/arch/sparc/mm/hugetlbpage.c
> @@ -260,7 +260,7 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
> }
>
> pte_t huge_ptep_get_and_clear(struct mm_struct *mm, unsigned long addr,
> - pte_t *ptep)
> + pte_t *ptep, unsigned long sz)
> {
> unsigned int i, nptes, orig_shift, shift;
> unsigned long size;
> diff --git a/include/asm-generic/hugetlb.h b/include/asm-generic/hugetlb.h
> index f42133dae68e..2afc95bf1655 100644
> --- a/include/asm-generic/hugetlb.h
> +++ b/include/asm-generic/hugetlb.h
> @@ -90,7 +90,7 @@ static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
>
> #ifndef __HAVE_ARCH_HUGE_PTEP_GET_AND_CLEAR
> static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
> - unsigned long addr, pte_t *ptep)
> + unsigned long addr, pte_t *ptep, unsigned long sz)
> {
> return ptep_get_and_clear(mm, addr, ptep);
> }
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index ec8c0ccc8f95..bf5f7256bd28 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -1004,7 +1004,9 @@ static inline void hugetlb_count_sub(long l, struct mm_struct *mm)
> static inline pte_t huge_ptep_modify_prot_start(struct vm_area_struct *vma,
> unsigned long addr, pte_t *ptep)
> {
> - return huge_ptep_get_and_clear(vma->vm_mm, addr, ptep);
> + unsigned long psize = huge_page_size(hstate_vma(vma));
> +
> + return huge_ptep_get_and_clear(vma->vm_mm, addr, ptep, psize);
> }
> #endif
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 65068671e460..de9d49e521c1 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -5447,7 +5447,7 @@ static void move_huge_pte(struct vm_area_struct *vma, unsigned long old_addr,
> if (src_ptl != dst_ptl)
> spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING);
>
> - pte = huge_ptep_get_and_clear(mm, old_addr, src_pte);
> + pte = huge_ptep_get_and_clear(mm, old_addr, src_pte, sz);
>
> if (need_clear_uffd_wp && pte_marker_uffd_wp(pte))
> huge_pte_clear(mm, new_addr, dst_pte, sz);
> @@ -5622,7 +5622,7 @@ void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
> set_vma_resv_flags(vma, HPAGE_RESV_UNMAPPED);
> }
>
> - pte = huge_ptep_get_and_clear(mm, address, ptep);
> + pte = huge_ptep_get_and_clear(mm, address, ptep, sz);
> tlb_remove_huge_tlb_entry(h, tlb, ptep, address);
> if (huge_pte_dirty(pte))
> set_page_dirty(page);
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v2 2/4] arm64: hugetlb: Fix huge_ptep_get_and_clear() for non-present ptes
2025-02-25 22:18 ` Will Deacon
@ 2025-02-26 8:09 ` Ryan Roberts
0 siblings, 0 replies; 30+ messages in thread
From: Ryan Roberts @ 2025-02-26 8:09 UTC (permalink / raw)
To: Will Deacon
Cc: Catalin Marinas, Huacai Chen, WANG Xuerui, Thomas Bogendoerfer,
James E.J. Bottomley, Helge Deller, Madhavan Srinivasan,
Michael Ellerman, Nicholas Piggin, Christophe Leroy,
Naveen N Rao, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, Gerald Schaefer,
David S. Miller, Andreas Larsson, Arnd Bergmann, Muchun Song,
Andrew Morton, Uladzislau Rezki, Christoph Hellwig,
David Hildenbrand, Matthew Wilcox (Oracle),
Mark Rutland, Anshuman Khandual, Dev Jain, Kevin Brodsky,
Alexandre Ghiti, linux-arm-kernel, linux-mm, linux-kernel,
stable
On 25/02/2025 22:18, Will Deacon wrote:
> On Mon, Feb 24, 2025 at 12:11:19PM +0000, Ryan Roberts wrote:
>> On 21/02/2025 15:31, Will Deacon wrote:
>>> On Mon, Feb 17, 2025 at 02:04:15PM +0000, Ryan Roberts wrote:
>>>> + pte = __ptep_get_and_clear(mm, addr, ptep);
>>>> + present = pte_present(pte);
>>>> + while (--ncontig) {
>>>> + ptep++;
>>>> + addr += pgsize;
>>>> + tmp_pte = __ptep_get_and_clear(mm, addr, ptep);
>>>> + if (present) {
>>>> + if (pte_dirty(tmp_pte))
>>>> + pte = pte_mkdirty(pte);
>>>> + if (pte_young(tmp_pte))
>>>> + pte = pte_mkyoung(pte);
>>>> + }
>>>> }
>>>
>>> nit: With the loop now structured like this, we really can't handle
>>> num_contig_ptes() returning 0 if it gets an unknown size. Granted, that
>>> really shouldn't happen, but perhaps it would be better to add a 'default'
>>> case with a WARN() to num_contig_ptes() and then add an early return here?
>>
>> Looking at other users of num_contig_ptes() it looks like huge_ptep_get()
>> already assumes at least 1 pte (it calls __ptep_get() before calling
>> num_contig_ptes()) and set_huge_pte_at() assumes 1 pte for the "present and
>> non-contig" case. So num_contig_ptes() returning 0 is already not really
>> consumed consistently.
>>
>> How about we change the default num_contig_ptes() return value to 1 and add a
>> warning if size is invalid:
>
> Fine by me!
>
> I assume you'll fold that in and send a new version, along with the typo
> fixes?
Yep, I'll aim to post this today. I have a few review comments for s390 to add
in too.
>
> Cheers,
>
> Will
^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2025-02-26 8:09 UTC | newest]
Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-02-17 14:04 [PATCH v2 0/4] Fixes for hugetlb and vmalloc on arm64 Ryan Roberts
2025-02-17 14:04 ` [PATCH v2 1/4] mm: hugetlb: Add huge page size param to huge_ptep_get_and_clear() Ryan Roberts
2025-02-19 8:28 ` Anshuman Khandual
2025-02-19 19:00 ` Catalin Marinas
2025-02-20 9:46 ` David Hildenbrand
2025-02-24 10:49 ` Alexandre Ghiti
2025-02-25 14:25 ` Alexander Gordeev
2025-02-25 15:43 ` Ryan Roberts
2025-02-26 7:16 ` Alexander Gordeev
2025-02-26 7:37 ` Christophe Leroy
2025-02-17 14:04 ` [PATCH v2 2/4] arm64: hugetlb: Fix huge_ptep_get_and_clear() for non-present ptes Ryan Roberts
2025-02-19 8:45 ` Anshuman Khandual
2025-02-19 8:58 ` Ryan Roberts
2025-02-20 6:37 ` Anshuman Khandual
2025-02-21 9:55 ` Catalin Marinas
2025-02-21 10:26 ` Ryan Roberts
2025-02-19 19:03 ` Catalin Marinas
2025-02-21 15:31 ` Will Deacon
2025-02-24 12:11 ` Ryan Roberts
2025-02-25 22:18 ` Will Deacon
2025-02-26 8:09 ` Ryan Roberts
2025-02-24 11:27 ` Alexandre Ghiti
2025-02-17 14:04 ` [PATCH v2 3/4] arm64: hugetlb: Fix flush_hugetlb_tlb_range() invalidation level Ryan Roberts
2025-02-19 8:56 ` Anshuman Khandual
2025-02-19 19:04 ` Catalin Marinas
2025-02-17 14:04 ` [PATCH v2 4/4] mm: Don't skip arch_sync_kernel_mappings() in error paths Ryan Roberts
2025-02-19 19:07 ` Catalin Marinas
2025-02-19 19:10 ` [PATCH v2 0/4] Fixes for hugetlb and vmalloc on arm64 Catalin Marinas
2025-02-21 15:35 ` Will Deacon
2025-02-24 12:13 ` Ryan Roberts
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox