linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/3] LoongArch: Fix vmalloc test issue
@ 2024-10-14  3:58 Bibo Mao
  2024-10-14  3:58 ` [PATCH v2 1/3] LoongArch: Set initial pte entry with PAGE_GLOBAL for kernel space Bibo Mao
                   ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Bibo Mao @ 2024-10-14  3:58 UTC (permalink / raw)
  To: Huacai Chen, Andrey Ryabinin, Andrew Morton
  Cc: David Hildenbrand, Barry Song, loongarch, linux-kernel,
	kasan-dev, linux-mm

On LoongArch 3C5000 Dual-Way machine, there are 32 CPUs and 128G RAM,
there are some errors with run vmalloc test with command like this
  insmod test_vmalloc.ko   nr_threads=32  run_test_mask=0x3af

Here is part of error message and summary test report for failed cases:
 WARNING: CPU: 13 PID: 1457 at mm/vmalloc.c:503 vmap_small_pages_range_noflush+0x388/0x510
 CPU: 13 UID: 0 PID: 1457 Comm: vmalloc_test/15 Not tainted 6.12.0-rc2+ #93

 Trying to vfree() nonexistent vm area (000000004dec9ced)
 WARNING: CPU: 3 PID: 1444 at mm/vmalloc.c:3345 vfree+0x1e8/0x4c8
 CPU: 3 UID: 0 PID: 1444 Comm: vmalloc_test/2

 Trying to vfree() bad address (00000000fc7c9da5)
 WARNING: CPU: 10 PID: 1552 at mm/vmalloc.c:3210 remove_vm_area+0x88/0x98
 CPU: 10 UID: 0 PID: 1552 Comm: kworker/u144:3

Summary: long_busy_list_alloc_test passed: 0 failed: 1 repeat: 1 loops: 1000000
Summary: long_busy_list_alloc_test passed: 0 failed: 1 repeat: 1 loops: 1000000
Summary: random_size_alloc_test passed: 0 failed: 1 repeat: 1 loops: 1000000
Summary: random_size_align_alloc_test passed: 0 failed: 1 repeat: 1 loops: 1000000
Summary: kvfree_rcu_2_arg_vmalloc_test passed: 0 failed: 1 repeat: 1 loops: 1000000
Summary: long_busy_list_alloc_test passed: 0 failed: 1 repeat: 1 loops: 1000000
Summary: random_size_align_alloc_test passed: 0 failed: 1 repeat: 1 loops: 1000000
Summary: fix_size_alloc_test passed: 0 failed: 1 repeat: 1 loops: 1000000
Summary: random_size_alloc_test passed: 0 failed: 1 repeat: 1 loops: 1000000
Summary: random_size_align_alloc_test passed: 0 failed: 1 repeat: 1 loops: 1000000
Summary: long_busy_list_alloc_test passed: 0 failed: 1 repeat: 1 loops: 1000000
Summary: random_size_align_alloc_test passed: 0 failed: 1 repeat: 1 loops: 1000000
Summary: long_busy_list_alloc_test passed: 0 failed: 1 repeat: 1 loops: 1000000
Summary: long_busy_list_alloc_test passed: 0 failed: 1 repeat: 1 loops: 1000000
Summary: long_busy_list_alloc_test passed: 0 failed: 1 repeat: 1 loops: 1000000
Summary: kvfree_rcu_2_arg_vmalloc_test passed: 0 failed: 1 repeat: 1 loops: 1000000
Summary: long_busy_list_alloc_test passed: 0 failed: 1 repeat: 1 loops: 1000000
Summary: random_size_alloc_test passed: 0 failed: 1 repeat: 1 loops: 1000000
Summary: kvfree_rcu_1_arg_vmalloc_test passed: 0 failed: 1 repeat: 1 loops: 1000000
Summary: long_busy_list_alloc_test passed: 0 failed: 1 repeat: 1 loops: 1000000
Summary: fix_size_alloc_test passed: 0 failed: 1 repeat: 1 loops: 1000000
Summary: long_busy_list_alloc_test passed: 0 failed: 1 repeat: 1 loops: 1000000

The mainly problem is that function set_pte() and pte_free() is not atomic,
since these functions need modify two consecutive pte entries for kernel
space area to assure that both pte entries with PAGE_GLOBAL bit set. And
there is contension problem between them.

With this patchset, vmalloc test case passes to run with command
  insmod test_vmalloc.ko   nr_threads=32  run_test_mask=0x3af

---
  v1 ... v2:
    1. Solve compile warning issue by declaring function
       kernel_pte_init() in header file include/linux/mm.h
    2. Add kernel_pte_init() in function zero_pmd_populate() called by
       file mm/kasan/init.c
    3. Merge the first two patches into one since both these two patches
       set pte entry with PAGE_GLOBAL in different modules
    4. Remove amotic operation with pte_clear(), using generic read and
       clear operation, vmalloc test pass to run also
    5. refresh some comments description
---
Bibo Mao (3):
  LoongArch: Set initial pte entry with PAGE_GLOBAL for kernel space
  LoongArch: Add barrier between set_pte and memory access
  LoongArch: Remove pte buddy set with set_pte and pte_clear function

 arch/loongarch/include/asm/cacheflush.h | 14 +++++++++-
 arch/loongarch/include/asm/pgalloc.h    | 13 +++++++++
 arch/loongarch/include/asm/pgtable.h    | 36 +++++--------------------
 arch/loongarch/mm/init.c                |  4 ++-
 arch/loongarch/mm/kasan_init.c          |  4 ++-
 arch/loongarch/mm/pgtable.c             | 22 +++++++++++++++
 include/linux/mm.h                      |  1 +
 mm/kasan/init.c                         |  8 +++++-
 mm/sparse-vmemmap.c                     |  5 ++++
 9 files changed, 73 insertions(+), 34 deletions(-)


base-commit: 6485cf5ea253d40d507cd71253c9568c5470cd27
-- 
2.39.3



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2 1/3] LoongArch: Set initial pte entry with PAGE_GLOBAL for kernel space
  2024-10-14  3:58 [PATCH v2 0/3] LoongArch: Fix vmalloc test issue Bibo Mao
@ 2024-10-14  3:58 ` Bibo Mao
  2024-10-18  3:14   ` Huacai Chen
  2024-10-14  3:58 ` [PATCH v2 2/3] LoongArch: Add barrier between set_pte and memory access Bibo Mao
  2024-10-14  3:58 ` [PATCH v2 3/3] LoongArch: Remove pte buddy set with set_pte and pte_clear function Bibo Mao
  2 siblings, 1 reply; 21+ messages in thread
From: Bibo Mao @ 2024-10-14  3:58 UTC (permalink / raw)
  To: Huacai Chen, Andrey Ryabinin, Andrew Morton
  Cc: David Hildenbrand, Barry Song, loongarch, linux-kernel,
	kasan-dev, linux-mm

Unlike general architectures, there are two pages in one TLB entry
on LoongArch system. For kernel space, it requires both two pte
entries with PAGE_GLOBAL bit set, else HW treats it as non-global
tlb, there will be potential problems if tlb entry for kernel space
is not global. Such as fail to flush kernel tlb with function
local_flush_tlb_kernel_range() which only flush tlb with global bit.

With function kernel_pte_init() added, it can be used to init pte
table when it is created for kernel address space, and the default
initial pte value is PAGE_GLOBAL rather than zero at beginning.

Kernel address space areas includes fixmap, percpu, vmalloc, kasan
and vmemmap areas set default pte entry with PAGE_GLOBAL set.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
---
 arch/loongarch/include/asm/pgalloc.h | 13 +++++++++++++
 arch/loongarch/include/asm/pgtable.h |  1 +
 arch/loongarch/mm/init.c             |  4 +++-
 arch/loongarch/mm/kasan_init.c       |  4 +++-
 arch/loongarch/mm/pgtable.c          | 22 ++++++++++++++++++++++
 include/linux/mm.h                   |  1 +
 mm/kasan/init.c                      |  8 +++++++-
 mm/sparse-vmemmap.c                  |  5 +++++
 8 files changed, 55 insertions(+), 3 deletions(-)

diff --git a/arch/loongarch/include/asm/pgalloc.h b/arch/loongarch/include/asm/pgalloc.h
index 4e2d6b7ca2ee..b2698c03dc2c 100644
--- a/arch/loongarch/include/asm/pgalloc.h
+++ b/arch/loongarch/include/asm/pgalloc.h
@@ -10,8 +10,21 @@
 
 #define __HAVE_ARCH_PMD_ALLOC_ONE
 #define __HAVE_ARCH_PUD_ALLOC_ONE
+#define __HAVE_ARCH_PTE_ALLOC_ONE_KERNEL
 #include <asm-generic/pgalloc.h>
 
+static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm)
+{
+	pte_t *pte;
+
+	pte = (pte_t *) __get_free_page(GFP_KERNEL);
+	if (!pte)
+		return NULL;
+
+	kernel_pte_init(pte);
+	return pte;
+}
+
 static inline void pmd_populate_kernel(struct mm_struct *mm,
 				       pmd_t *pmd, pte_t *pte)
 {
diff --git a/arch/loongarch/include/asm/pgtable.h b/arch/loongarch/include/asm/pgtable.h
index 9965f52ef65b..22e3a8f96213 100644
--- a/arch/loongarch/include/asm/pgtable.h
+++ b/arch/loongarch/include/asm/pgtable.h
@@ -269,6 +269,7 @@ extern void set_pmd_at(struct mm_struct *mm, unsigned long addr, pmd_t *pmdp, pm
 extern void pgd_init(void *addr);
 extern void pud_init(void *addr);
 extern void pmd_init(void *addr);
+extern void kernel_pte_init(void *addr);
 
 /*
  * Encode/decode swap entries and swap PTEs. Swap PTEs are all PTEs that
diff --git a/arch/loongarch/mm/init.c b/arch/loongarch/mm/init.c
index 8a87a482c8f4..9f26e933a8a3 100644
--- a/arch/loongarch/mm/init.c
+++ b/arch/loongarch/mm/init.c
@@ -198,9 +198,11 @@ pte_t * __init populate_kernel_pte(unsigned long addr)
 	if (!pmd_present(pmdp_get(pmd))) {
 		pte_t *pte;
 
-		pte = memblock_alloc(PAGE_SIZE, PAGE_SIZE);
+		pte = memblock_alloc_raw(PAGE_SIZE, PAGE_SIZE);
 		if (!pte)
 			panic("%s: Failed to allocate memory\n", __func__);
+
+		kernel_pte_init(pte);
 		pmd_populate_kernel(&init_mm, pmd, pte);
 	}
 
diff --git a/arch/loongarch/mm/kasan_init.c b/arch/loongarch/mm/kasan_init.c
index 427d6b1aec09..34988573b0d5 100644
--- a/arch/loongarch/mm/kasan_init.c
+++ b/arch/loongarch/mm/kasan_init.c
@@ -152,6 +152,8 @@ static void __init kasan_pte_populate(pmd_t *pmdp, unsigned long addr,
 		phys_addr_t page_phys = early ?
 					__pa_symbol(kasan_early_shadow_page)
 					      : kasan_alloc_zeroed_page(node);
+		if (!early)
+			kernel_pte_init(__va(page_phys));
 		next = addr + PAGE_SIZE;
 		set_pte(ptep, pfn_pte(__phys_to_pfn(page_phys), PAGE_KERNEL));
 	} while (ptep++, addr = next, addr != end && __pte_none(early, ptep_get(ptep)));
@@ -287,7 +289,7 @@ void __init kasan_init(void)
 		set_pte(&kasan_early_shadow_pte[i],
 			pfn_pte(__phys_to_pfn(__pa_symbol(kasan_early_shadow_page)), PAGE_KERNEL_RO));
 
-	memset(kasan_early_shadow_page, 0, PAGE_SIZE);
+	kernel_pte_init(kasan_early_shadow_page);
 	csr_write64(__pa_symbol(swapper_pg_dir), LOONGARCH_CSR_PGDH);
 	local_flush_tlb_all();
 
diff --git a/arch/loongarch/mm/pgtable.c b/arch/loongarch/mm/pgtable.c
index eb6a29b491a7..228ffc1db0a3 100644
--- a/arch/loongarch/mm/pgtable.c
+++ b/arch/loongarch/mm/pgtable.c
@@ -38,6 +38,28 @@ pgd_t *pgd_alloc(struct mm_struct *mm)
 }
 EXPORT_SYMBOL_GPL(pgd_alloc);
 
+void kernel_pte_init(void *addr)
+{
+	unsigned long *p, *end;
+	unsigned long entry;
+
+	entry = (unsigned long)_PAGE_GLOBAL;
+	p = (unsigned long *)addr;
+	end = p + PTRS_PER_PTE;
+
+	do {
+		p[0] = entry;
+		p[1] = entry;
+		p[2] = entry;
+		p[3] = entry;
+		p[4] = entry;
+		p += 8;
+		p[-3] = entry;
+		p[-2] = entry;
+		p[-1] = entry;
+	} while (p != end);
+}
+
 void pgd_init(void *addr)
 {
 	unsigned long *p, *end;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index ecf63d2b0582..6909fe059a2c 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3818,6 +3818,7 @@ void *sparse_buffer_alloc(unsigned long size);
 struct page * __populate_section_memmap(unsigned long pfn,
 		unsigned long nr_pages, int nid, struct vmem_altmap *altmap,
 		struct dev_pagemap *pgmap);
+void kernel_pte_init(void *addr);
 void pmd_init(void *addr);
 void pud_init(void *addr);
 pgd_t *vmemmap_pgd_populate(unsigned long addr, int node);
diff --git a/mm/kasan/init.c b/mm/kasan/init.c
index 89895f38f722..ac607c306292 100644
--- a/mm/kasan/init.c
+++ b/mm/kasan/init.c
@@ -106,6 +106,10 @@ static void __ref zero_pte_populate(pmd_t *pmd, unsigned long addr,
 	}
 }
 
+void __weak __meminit kernel_pte_init(void *addr)
+{
+}
+
 static int __ref zero_pmd_populate(pud_t *pud, unsigned long addr,
 				unsigned long end)
 {
@@ -126,8 +130,10 @@ static int __ref zero_pmd_populate(pud_t *pud, unsigned long addr,
 
 			if (slab_is_available())
 				p = pte_alloc_one_kernel(&init_mm);
-			else
+			else {
 				p = early_alloc(PAGE_SIZE, NUMA_NO_NODE);
+				kernel_pte_init(p);
+			}
 			if (!p)
 				return -ENOMEM;
 
diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index edcc7a6b0f6f..c0388b2e959d 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -184,6 +184,10 @@ static void * __meminit vmemmap_alloc_block_zero(unsigned long size, int node)
 	return p;
 }
 
+void __weak __meminit kernel_pte_init(void *addr)
+{
+}
+
 pmd_t * __meminit vmemmap_pmd_populate(pud_t *pud, unsigned long addr, int node)
 {
 	pmd_t *pmd = pmd_offset(pud, addr);
@@ -191,6 +195,7 @@ pmd_t * __meminit vmemmap_pmd_populate(pud_t *pud, unsigned long addr, int node)
 		void *p = vmemmap_alloc_block_zero(PAGE_SIZE, node);
 		if (!p)
 			return NULL;
+		kernel_pte_init(p);
 		pmd_populate_kernel(&init_mm, pmd, p);
 	}
 	return pmd;
-- 
2.39.3



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2 2/3] LoongArch: Add barrier between set_pte and memory access
  2024-10-14  3:58 [PATCH v2 0/3] LoongArch: Fix vmalloc test issue Bibo Mao
  2024-10-14  3:58 ` [PATCH v2 1/3] LoongArch: Set initial pte entry with PAGE_GLOBAL for kernel space Bibo Mao
@ 2024-10-14  3:58 ` Bibo Mao
  2024-10-14  6:31   ` Huacai Chen
  2024-10-14  3:58 ` [PATCH v2 3/3] LoongArch: Remove pte buddy set with set_pte and pte_clear function Bibo Mao
  2 siblings, 1 reply; 21+ messages in thread
From: Bibo Mao @ 2024-10-14  3:58 UTC (permalink / raw)
  To: Huacai Chen, Andrey Ryabinin, Andrew Morton
  Cc: David Hildenbrand, Barry Song, loongarch, linux-kernel,
	kasan-dev, linux-mm

It is possible to return a spurious fault if memory is accessed
right after the pte is set. For user address space, pte is set
in kernel space and memory is accessed in user space, there is
long time for synchronization, no barrier needed. However for
kernel address space, it is possible that memory is accessed
right after the pte is set.

Here flush_cache_vmap/flush_cache_vmap_early is used for
synchronization.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
---
 arch/loongarch/include/asm/cacheflush.h | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/arch/loongarch/include/asm/cacheflush.h b/arch/loongarch/include/asm/cacheflush.h
index f8754d08a31a..53be231319ef 100644
--- a/arch/loongarch/include/asm/cacheflush.h
+++ b/arch/loongarch/include/asm/cacheflush.h
@@ -42,12 +42,24 @@ void local_flush_icache_range(unsigned long start, unsigned long end);
 #define flush_cache_dup_mm(mm)				do { } while (0)
 #define flush_cache_range(vma, start, end)		do { } while (0)
 #define flush_cache_page(vma, vmaddr, pfn)		do { } while (0)
-#define flush_cache_vmap(start, end)			do { } while (0)
 #define flush_cache_vunmap(start, end)			do { } while (0)
 #define flush_icache_user_page(vma, page, addr, len)	do { } while (0)
 #define flush_dcache_mmap_lock(mapping)			do { } while (0)
 #define flush_dcache_mmap_unlock(mapping)		do { } while (0)
 
+/*
+ * It is possible for a kernel virtual mapping access to return a spurious
+ * fault if it's accessed right after the pte is set. The page fault handler
+ * does not expect this type of fault. flush_cache_vmap is not exactly the
+ * right place to put this, but it seems to work well enough.
+ */
+static inline void flush_cache_vmap(unsigned long start, unsigned long end)
+{
+	smp_mb();
+}
+#define flush_cache_vmap flush_cache_vmap
+#define flush_cache_vmap_early	flush_cache_vmap
+
 #define cache_op(op, addr)						\
 	__asm__ __volatile__(						\
 	"	cacop	%0, %1					\n"	\
-- 
2.39.3



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2 3/3] LoongArch: Remove pte buddy set with set_pte and pte_clear function
  2024-10-14  3:58 [PATCH v2 0/3] LoongArch: Fix vmalloc test issue Bibo Mao
  2024-10-14  3:58 ` [PATCH v2 1/3] LoongArch: Set initial pte entry with PAGE_GLOBAL for kernel space Bibo Mao
  2024-10-14  3:58 ` [PATCH v2 2/3] LoongArch: Add barrier between set_pte and memory access Bibo Mao
@ 2024-10-14  3:58 ` Bibo Mao
  2024-10-14  6:33   ` Huacai Chen
  2 siblings, 1 reply; 21+ messages in thread
From: Bibo Mao @ 2024-10-14  3:58 UTC (permalink / raw)
  To: Huacai Chen, Andrey Ryabinin, Andrew Morton
  Cc: David Hildenbrand, Barry Song, loongarch, linux-kernel,
	kasan-dev, linux-mm

For kernel address space area on LoongArch system, both two consecutive
page table entries should be enabled with PAGE_GLOBAL bit. So with
function set_pte() and pte_clear(), pte buddy entry is checked and set
besides its own pte entry. However it is not atomic operation to set both
two pte entries, there is problem with test_vmalloc test case.

With previous patch, all page table entries are set with PAGE_GLOBAL
bit at beginning. Only its own pte entry need update with function
set_pte() and pte_clear(), nothing to do with pte buddy entry.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
---
 arch/loongarch/include/asm/pgtable.h | 35 ++++------------------------
 1 file changed, 5 insertions(+), 30 deletions(-)

diff --git a/arch/loongarch/include/asm/pgtable.h b/arch/loongarch/include/asm/pgtable.h
index 22e3a8f96213..bc29c95b1710 100644
--- a/arch/loongarch/include/asm/pgtable.h
+++ b/arch/loongarch/include/asm/pgtable.h
@@ -325,40 +325,15 @@ extern void paging_init(void);
 static inline void set_pte(pte_t *ptep, pte_t pteval)
 {
 	WRITE_ONCE(*ptep, pteval);
-
-	if (pte_val(pteval) & _PAGE_GLOBAL) {
-		pte_t *buddy = ptep_buddy(ptep);
-		/*
-		 * Make sure the buddy is global too (if it's !none,
-		 * it better already be global)
-		 */
-		if (pte_none(ptep_get(buddy))) {
-#ifdef CONFIG_SMP
-			/*
-			 * For SMP, multiple CPUs can race, so we need
-			 * to do this atomically.
-			 */
-			__asm__ __volatile__(
-			__AMOR "$zero, %[global], %[buddy] \n"
-			: [buddy] "+ZB" (buddy->pte)
-			: [global] "r" (_PAGE_GLOBAL)
-			: "memory");
-
-			DBAR(0b11000); /* o_wrw = 0b11000 */
-#else /* !CONFIG_SMP */
-			WRITE_ONCE(*buddy, __pte(pte_val(ptep_get(buddy)) | _PAGE_GLOBAL));
-#endif /* CONFIG_SMP */
-		}
-	}
 }
 
 static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
 {
-	/* Preserve global status for the pair */
-	if (pte_val(ptep_get(ptep_buddy(ptep))) & _PAGE_GLOBAL)
-		set_pte(ptep, __pte(_PAGE_GLOBAL));
-	else
-		set_pte(ptep, __pte(0));
+	pte_t pte;
+
+	pte = ptep_get(ptep);
+	pte_val(pte) &= _PAGE_GLOBAL;
+	set_pte(ptep, pte);
 }
 
 #define PGD_T_LOG2	(__builtin_ffs(sizeof(pgd_t)) - 1)
-- 
2.39.3



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 2/3] LoongArch: Add barrier between set_pte and memory access
  2024-10-14  3:58 ` [PATCH v2 2/3] LoongArch: Add barrier between set_pte and memory access Bibo Mao
@ 2024-10-14  6:31   ` Huacai Chen
  2024-10-15  2:53     ` maobibo
  0 siblings, 1 reply; 21+ messages in thread
From: Huacai Chen @ 2024-10-14  6:31 UTC (permalink / raw)
  To: Bibo Mao
  Cc: Andrey Ryabinin, Andrew Morton, David Hildenbrand, Barry Song,
	loongarch, linux-kernel, kasan-dev, linux-mm

Hi, Bibo,

On Mon, Oct 14, 2024 at 11:59 AM Bibo Mao <maobibo@loongson.cn> wrote:
>
> It is possible to return a spurious fault if memory is accessed
> right after the pte is set. For user address space, pte is set
> in kernel space and memory is accessed in user space, there is
> long time for synchronization, no barrier needed. However for
> kernel address space, it is possible that memory is accessed
> right after the pte is set.
>
> Here flush_cache_vmap/flush_cache_vmap_early is used for
> synchronization.
>
> Signed-off-by: Bibo Mao <maobibo@loongson.cn>
> ---
>  arch/loongarch/include/asm/cacheflush.h | 14 +++++++++++++-
>  1 file changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/arch/loongarch/include/asm/cacheflush.h b/arch/loongarch/include/asm/cacheflush.h
> index f8754d08a31a..53be231319ef 100644
> --- a/arch/loongarch/include/asm/cacheflush.h
> +++ b/arch/loongarch/include/asm/cacheflush.h
> @@ -42,12 +42,24 @@ void local_flush_icache_range(unsigned long start, unsigned long end);
>  #define flush_cache_dup_mm(mm)                         do { } while (0)
>  #define flush_cache_range(vma, start, end)             do { } while (0)
>  #define flush_cache_page(vma, vmaddr, pfn)             do { } while (0)
> -#define flush_cache_vmap(start, end)                   do { } while (0)
>  #define flush_cache_vunmap(start, end)                 do { } while (0)
>  #define flush_icache_user_page(vma, page, addr, len)   do { } while (0)
>  #define flush_dcache_mmap_lock(mapping)                        do { } while (0)
>  #define flush_dcache_mmap_unlock(mapping)              do { } while (0)
>
> +/*
> + * It is possible for a kernel virtual mapping access to return a spurious
> + * fault if it's accessed right after the pte is set. The page fault handler
> + * does not expect this type of fault. flush_cache_vmap is not exactly the
> + * right place to put this, but it seems to work well enough.
> + */
> +static inline void flush_cache_vmap(unsigned long start, unsigned long end)
> +{
> +       smp_mb();
> +}
> +#define flush_cache_vmap flush_cache_vmap
> +#define flush_cache_vmap_early flush_cache_vmap
From the history of flush_cache_vmap_early(), It seems only archs with
"virtual cache" (VIVT or VIPT) need this API, so LoongArch can be a
no-op here.

And I still think flush_cache_vunmap() should be a smp_mb(). A
smp_mb() in flush_cache_vmap() prevents subsequent accesses be
reordered before pte_set(), and a smp_mb() in flush_cache_vunmap()
prevents preceding accesses be reordered after pte_clear(). This
potential problem may not be seen from experiment, but it is needed in
theory.

Huacai

> +
>  #define cache_op(op, addr)                                             \
>         __asm__ __volatile__(                                           \
>         "       cacop   %0, %1                                  \n"     \
> --
> 2.39.3
>
>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 3/3] LoongArch: Remove pte buddy set with set_pte and pte_clear function
  2024-10-14  3:58 ` [PATCH v2 3/3] LoongArch: Remove pte buddy set with set_pte and pte_clear function Bibo Mao
@ 2024-10-14  6:33   ` Huacai Chen
  0 siblings, 0 replies; 21+ messages in thread
From: Huacai Chen @ 2024-10-14  6:33 UTC (permalink / raw)
  To: Bibo Mao
  Cc: Andrey Ryabinin, Andrew Morton, David Hildenbrand, Barry Song,
	loongarch, linux-kernel, kasan-dev, linux-mm

Hi, Bibo,

The old code tries to fix the same problem in the first patch, so this
patch can also be squashed to the first one (and it is small enough
now).

Others look good to me.

Huacai

On Mon, Oct 14, 2024 at 11:59 AM Bibo Mao <maobibo@loongson.cn> wrote:
>
> For kernel address space area on LoongArch system, both two consecutive
> page table entries should be enabled with PAGE_GLOBAL bit. So with
> function set_pte() and pte_clear(), pte buddy entry is checked and set
> besides its own pte entry. However it is not atomic operation to set both
> two pte entries, there is problem with test_vmalloc test case.
>
> With previous patch, all page table entries are set with PAGE_GLOBAL
> bit at beginning. Only its own pte entry need update with function
> set_pte() and pte_clear(), nothing to do with pte buddy entry.
>
> Signed-off-by: Bibo Mao <maobibo@loongson.cn>
> ---
>  arch/loongarch/include/asm/pgtable.h | 35 ++++------------------------
>  1 file changed, 5 insertions(+), 30 deletions(-)
>
> diff --git a/arch/loongarch/include/asm/pgtable.h b/arch/loongarch/include/asm/pgtable.h
> index 22e3a8f96213..bc29c95b1710 100644
> --- a/arch/loongarch/include/asm/pgtable.h
> +++ b/arch/loongarch/include/asm/pgtable.h
> @@ -325,40 +325,15 @@ extern void paging_init(void);
>  static inline void set_pte(pte_t *ptep, pte_t pteval)
>  {
>         WRITE_ONCE(*ptep, pteval);
> -
> -       if (pte_val(pteval) & _PAGE_GLOBAL) {
> -               pte_t *buddy = ptep_buddy(ptep);
> -               /*
> -                * Make sure the buddy is global too (if it's !none,
> -                * it better already be global)
> -                */
> -               if (pte_none(ptep_get(buddy))) {
> -#ifdef CONFIG_SMP
> -                       /*
> -                        * For SMP, multiple CPUs can race, so we need
> -                        * to do this atomically.
> -                        */
> -                       __asm__ __volatile__(
> -                       __AMOR "$zero, %[global], %[buddy] \n"
> -                       : [buddy] "+ZB" (buddy->pte)
> -                       : [global] "r" (_PAGE_GLOBAL)
> -                       : "memory");
> -
> -                       DBAR(0b11000); /* o_wrw = 0b11000 */
> -#else /* !CONFIG_SMP */
> -                       WRITE_ONCE(*buddy, __pte(pte_val(ptep_get(buddy)) | _PAGE_GLOBAL));
> -#endif /* CONFIG_SMP */
> -               }
> -       }
>  }
>
>  static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
>  {
> -       /* Preserve global status for the pair */
> -       if (pte_val(ptep_get(ptep_buddy(ptep))) & _PAGE_GLOBAL)
> -               set_pte(ptep, __pte(_PAGE_GLOBAL));
> -       else
> -               set_pte(ptep, __pte(0));
> +       pte_t pte;
> +
> +       pte = ptep_get(ptep);
> +       pte_val(pte) &= _PAGE_GLOBAL;
> +       set_pte(ptep, pte);
>  }
>
>  #define PGD_T_LOG2     (__builtin_ffs(sizeof(pgd_t)) - 1)
> --
> 2.39.3
>
>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 2/3] LoongArch: Add barrier between set_pte and memory access
  2024-10-14  6:31   ` Huacai Chen
@ 2024-10-15  2:53     ` maobibo
  2024-10-15 12:27       ` Huacai Chen
  0 siblings, 1 reply; 21+ messages in thread
From: maobibo @ 2024-10-15  2:53 UTC (permalink / raw)
  To: Huacai Chen
  Cc: Andrey Ryabinin, Andrew Morton, David Hildenbrand, Barry Song,
	loongarch, linux-kernel, kasan-dev, linux-mm



On 2024/10/14 下午2:31, Huacai Chen wrote:
> Hi, Bibo,
> 
> On Mon, Oct 14, 2024 at 11:59 AM Bibo Mao <maobibo@loongson.cn> wrote:
>>
>> It is possible to return a spurious fault if memory is accessed
>> right after the pte is set. For user address space, pte is set
>> in kernel space and memory is accessed in user space, there is
>> long time for synchronization, no barrier needed. However for
>> kernel address space, it is possible that memory is accessed
>> right after the pte is set.
>>
>> Here flush_cache_vmap/flush_cache_vmap_early is used for
>> synchronization.
>>
>> Signed-off-by: Bibo Mao <maobibo@loongson.cn>
>> ---
>>   arch/loongarch/include/asm/cacheflush.h | 14 +++++++++++++-
>>   1 file changed, 13 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/loongarch/include/asm/cacheflush.h b/arch/loongarch/include/asm/cacheflush.h
>> index f8754d08a31a..53be231319ef 100644
>> --- a/arch/loongarch/include/asm/cacheflush.h
>> +++ b/arch/loongarch/include/asm/cacheflush.h
>> @@ -42,12 +42,24 @@ void local_flush_icache_range(unsigned long start, unsigned long end);
>>   #define flush_cache_dup_mm(mm)                         do { } while (0)
>>   #define flush_cache_range(vma, start, end)             do { } while (0)
>>   #define flush_cache_page(vma, vmaddr, pfn)             do { } while (0)
>> -#define flush_cache_vmap(start, end)                   do { } while (0)
>>   #define flush_cache_vunmap(start, end)                 do { } while (0)
>>   #define flush_icache_user_page(vma, page, addr, len)   do { } while (0)
>>   #define flush_dcache_mmap_lock(mapping)                        do { } while (0)
>>   #define flush_dcache_mmap_unlock(mapping)              do { } while (0)
>>
>> +/*
>> + * It is possible for a kernel virtual mapping access to return a spurious
>> + * fault if it's accessed right after the pte is set. The page fault handler
>> + * does not expect this type of fault. flush_cache_vmap is not exactly the
>> + * right place to put this, but it seems to work well enough.
>> + */
>> +static inline void flush_cache_vmap(unsigned long start, unsigned long end)
>> +{
>> +       smp_mb();
>> +}
>> +#define flush_cache_vmap flush_cache_vmap
>> +#define flush_cache_vmap_early flush_cache_vmap
>  From the history of flush_cache_vmap_early(), It seems only archs with
> "virtual cache" (VIVT or VIPT) need this API, so LoongArch can be a
> no-op here.

Here is usage about flush_cache_vmap_early in file linux/mm/percpu.c,
map the page and access it immediately. Do you think it should be noop 
on LoongArch.

rc = __pcpu_map_pages(unit_addr, &pages[unit * unit_pages],
                                      unit_pages);
if (rc < 0)
     panic("failed to map percpu area, err=%d\n", rc);
     flush_cache_vmap_early(unit_addr, unit_addr + ai->unit_size);
     /* copy static data */
     memcpy((void *)unit_addr, __per_cpu_load, ai->static_size);
}


> 
> And I still think flush_cache_vunmap() should be a smp_mb(). A
> smp_mb() in flush_cache_vmap() prevents subsequent accesses be
> reordered before pte_set(), and a smp_mb() in flush_cache_vunmap()
smp_mb() in flush_cache_vmap() does not prevent reorder. It is to flush 
pipeline and let page table walker HW sync with data cache.

For the following example.
   rb = vmap(pages, nr_meta_pages + 2 * nr_data_pages,
                   VM_MAP | VM_USERMAP, PAGE_KERNEL);
   if (rb) {
<<<<<<<<<<< * the sentence if (rb) can prevent reorder. Otherwise with 
any API kmalloc/vmap/vmalloc and subsequent memory access, there will be 
reorder issu. *
       kmemleak_not_leak(pages);
       rb->pages = pages;
       rb->nr_pages = nr_pages;
       return rb;
   }

> prevents preceding accesses be reordered after pte_clear(). This
Can you give an example about such usage about flush_cache_vunmap()? and 
we can continue to talk about it, else it is just guessing.

Regards
Bibo Mao
> potential problem may not be seen from experiment, but it is needed in
> theory.
> 
> Huacai
> 
>> +
>>   #define cache_op(op, addr)                                             \
>>          __asm__ __volatile__(                                           \
>>          "       cacop   %0, %1                                  \n"     \
>> --
>> 2.39.3
>>
>>



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 2/3] LoongArch: Add barrier between set_pte and memory access
  2024-10-15  2:53     ` maobibo
@ 2024-10-15 12:27       ` Huacai Chen
  2024-10-16  6:09         ` maobibo
  0 siblings, 1 reply; 21+ messages in thread
From: Huacai Chen @ 2024-10-15 12:27 UTC (permalink / raw)
  To: maobibo
  Cc: Andrey Ryabinin, Andrew Morton, David Hildenbrand, Barry Song,
	loongarch, linux-kernel, kasan-dev, linux-mm

On Tue, Oct 15, 2024 at 10:54 AM maobibo <maobibo@loongson.cn> wrote:
>
>
>
> On 2024/10/14 下午2:31, Huacai Chen wrote:
> > Hi, Bibo,
> >
> > On Mon, Oct 14, 2024 at 11:59 AM Bibo Mao <maobibo@loongson.cn> wrote:
> >>
> >> It is possible to return a spurious fault if memory is accessed
> >> right after the pte is set. For user address space, pte is set
> >> in kernel space and memory is accessed in user space, there is
> >> long time for synchronization, no barrier needed. However for
> >> kernel address space, it is possible that memory is accessed
> >> right after the pte is set.
> >>
> >> Here flush_cache_vmap/flush_cache_vmap_early is used for
> >> synchronization.
> >>
> >> Signed-off-by: Bibo Mao <maobibo@loongson.cn>
> >> ---
> >>   arch/loongarch/include/asm/cacheflush.h | 14 +++++++++++++-
> >>   1 file changed, 13 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/arch/loongarch/include/asm/cacheflush.h b/arch/loongarch/include/asm/cacheflush.h
> >> index f8754d08a31a..53be231319ef 100644
> >> --- a/arch/loongarch/include/asm/cacheflush.h
> >> +++ b/arch/loongarch/include/asm/cacheflush.h
> >> @@ -42,12 +42,24 @@ void local_flush_icache_range(unsigned long start, unsigned long end);
> >>   #define flush_cache_dup_mm(mm)                         do { } while (0)
> >>   #define flush_cache_range(vma, start, end)             do { } while (0)
> >>   #define flush_cache_page(vma, vmaddr, pfn)             do { } while (0)
> >> -#define flush_cache_vmap(start, end)                   do { } while (0)
> >>   #define flush_cache_vunmap(start, end)                 do { } while (0)
> >>   #define flush_icache_user_page(vma, page, addr, len)   do { } while (0)
> >>   #define flush_dcache_mmap_lock(mapping)                        do { } while (0)
> >>   #define flush_dcache_mmap_unlock(mapping)              do { } while (0)
> >>
> >> +/*
> >> + * It is possible for a kernel virtual mapping access to return a spurious
> >> + * fault if it's accessed right after the pte is set. The page fault handler
> >> + * does not expect this type of fault. flush_cache_vmap is not exactly the
> >> + * right place to put this, but it seems to work well enough.
> >> + */
> >> +static inline void flush_cache_vmap(unsigned long start, unsigned long end)
> >> +{
> >> +       smp_mb();
> >> +}
> >> +#define flush_cache_vmap flush_cache_vmap
> >> +#define flush_cache_vmap_early flush_cache_vmap
> >  From the history of flush_cache_vmap_early(), It seems only archs with
> > "virtual cache" (VIVT or VIPT) need this API, so LoongArch can be a
> > no-op here.
OK,  flush_cache_vmap_early() also needs smp_mb().

>
> Here is usage about flush_cache_vmap_early in file linux/mm/percpu.c,
> map the page and access it immediately. Do you think it should be noop
> on LoongArch.
>
> rc = __pcpu_map_pages(unit_addr, &pages[unit * unit_pages],
>                                       unit_pages);
> if (rc < 0)
>      panic("failed to map percpu area, err=%d\n", rc);
>      flush_cache_vmap_early(unit_addr, unit_addr + ai->unit_size);
>      /* copy static data */
>      memcpy((void *)unit_addr, __per_cpu_load, ai->static_size);
> }
>
>
> >
> > And I still think flush_cache_vunmap() should be a smp_mb(). A
> > smp_mb() in flush_cache_vmap() prevents subsequent accesses be
> > reordered before pte_set(), and a smp_mb() in flush_cache_vunmap()
> smp_mb() in flush_cache_vmap() does not prevent reorder. It is to flush
> pipeline and let page table walker HW sync with data cache.
>
> For the following example.
>    rb = vmap(pages, nr_meta_pages + 2 * nr_data_pages,
>                    VM_MAP | VM_USERMAP, PAGE_KERNEL);
>    if (rb) {
> <<<<<<<<<<< * the sentence if (rb) can prevent reorder. Otherwise with
> any API kmalloc/vmap/vmalloc and subsequent memory access, there will be
> reorder issu. *
>        kmemleak_not_leak(pages);
>        rb->pages = pages;
>        rb->nr_pages = nr_pages;
>        return rb;
>    }
>
> > prevents preceding accesses be reordered after pte_clear(). This
> Can you give an example about such usage about flush_cache_vunmap()? and
> we can continue to talk about it, else it is just guessing.
Since we cannot reach a consensus, and the flush_cache_* API look very
strange for this purpose (Yes, I know PowerPC does it like this, but
ARM64 doesn't). I prefer to still use the ARM64 method which means add
a dbar in set_pte(). Of course the performance will be a little worse,
but still better than the old version, and it is more robust.

I know you are very busy, so if you have no time you don't need to
send V3, I can just do a small modification on the 3rd patch.


Huacai

>
> Regards
> Bibo Mao
> > potential problem may not be seen from experiment, but it is needed in
> > theory.
> >
> > Huacai
> >
> >> +
> >>   #define cache_op(op, addr)                                             \
> >>          __asm__ __volatile__(                                           \
> >>          "       cacop   %0, %1                                  \n"     \
> >> --
> >> 2.39.3
> >>
> >>
>
>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 2/3] LoongArch: Add barrier between set_pte and memory access
  2024-10-15 12:27       ` Huacai Chen
@ 2024-10-16  6:09         ` maobibo
  2024-10-16  7:30           ` Huacai Chen
  0 siblings, 1 reply; 21+ messages in thread
From: maobibo @ 2024-10-16  6:09 UTC (permalink / raw)
  To: Huacai Chen
  Cc: Andrey Ryabinin, Andrew Morton, David Hildenbrand, Barry Song,
	loongarch, linux-kernel, kasan-dev, linux-mm



On 2024/10/15 下午8:27, Huacai Chen wrote:
> On Tue, Oct 15, 2024 at 10:54 AM maobibo <maobibo@loongson.cn> wrote:
>>
>>
>>
>> On 2024/10/14 下午2:31, Huacai Chen wrote:
>>> Hi, Bibo,
>>>
>>> On Mon, Oct 14, 2024 at 11:59 AM Bibo Mao <maobibo@loongson.cn> wrote:
>>>>
>>>> It is possible to return a spurious fault if memory is accessed
>>>> right after the pte is set. For user address space, pte is set
>>>> in kernel space and memory is accessed in user space, there is
>>>> long time for synchronization, no barrier needed. However for
>>>> kernel address space, it is possible that memory is accessed
>>>> right after the pte is set.
>>>>
>>>> Here flush_cache_vmap/flush_cache_vmap_early is used for
>>>> synchronization.
>>>>
>>>> Signed-off-by: Bibo Mao <maobibo@loongson.cn>
>>>> ---
>>>>    arch/loongarch/include/asm/cacheflush.h | 14 +++++++++++++-
>>>>    1 file changed, 13 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/arch/loongarch/include/asm/cacheflush.h b/arch/loongarch/include/asm/cacheflush.h
>>>> index f8754d08a31a..53be231319ef 100644
>>>> --- a/arch/loongarch/include/asm/cacheflush.h
>>>> +++ b/arch/loongarch/include/asm/cacheflush.h
>>>> @@ -42,12 +42,24 @@ void local_flush_icache_range(unsigned long start, unsigned long end);
>>>>    #define flush_cache_dup_mm(mm)                         do { } while (0)
>>>>    #define flush_cache_range(vma, start, end)             do { } while (0)
>>>>    #define flush_cache_page(vma, vmaddr, pfn)             do { } while (0)
>>>> -#define flush_cache_vmap(start, end)                   do { } while (0)
>>>>    #define flush_cache_vunmap(start, end)                 do { } while (0)
>>>>    #define flush_icache_user_page(vma, page, addr, len)   do { } while (0)
>>>>    #define flush_dcache_mmap_lock(mapping)                        do { } while (0)
>>>>    #define flush_dcache_mmap_unlock(mapping)              do { } while (0)
>>>>
>>>> +/*
>>>> + * It is possible for a kernel virtual mapping access to return a spurious
>>>> + * fault if it's accessed right after the pte is set. The page fault handler
>>>> + * does not expect this type of fault. flush_cache_vmap is not exactly the
>>>> + * right place to put this, but it seems to work well enough.
>>>> + */
>>>> +static inline void flush_cache_vmap(unsigned long start, unsigned long end)
>>>> +{
>>>> +       smp_mb();
>>>> +}
>>>> +#define flush_cache_vmap flush_cache_vmap
>>>> +#define flush_cache_vmap_early flush_cache_vmap
>>>   From the history of flush_cache_vmap_early(), It seems only archs with
>>> "virtual cache" (VIVT or VIPT) need this API, so LoongArch can be a
>>> no-op here.
> OK,  flush_cache_vmap_early() also needs smp_mb().
> 
>>
>> Here is usage about flush_cache_vmap_early in file linux/mm/percpu.c,
>> map the page and access it immediately. Do you think it should be noop
>> on LoongArch.
>>
>> rc = __pcpu_map_pages(unit_addr, &pages[unit * unit_pages],
>>                                        unit_pages);
>> if (rc < 0)
>>       panic("failed to map percpu area, err=%d\n", rc);
>>       flush_cache_vmap_early(unit_addr, unit_addr + ai->unit_size);
>>       /* copy static data */
>>       memcpy((void *)unit_addr, __per_cpu_load, ai->static_size);
>> }
>>
>>
>>>
>>> And I still think flush_cache_vunmap() should be a smp_mb(). A
>>> smp_mb() in flush_cache_vmap() prevents subsequent accesses be
>>> reordered before pte_set(), and a smp_mb() in flush_cache_vunmap()
>> smp_mb() in flush_cache_vmap() does not prevent reorder. It is to flush
>> pipeline and let page table walker HW sync with data cache.
>>
>> For the following example.
>>     rb = vmap(pages, nr_meta_pages + 2 * nr_data_pages,
>>                     VM_MAP | VM_USERMAP, PAGE_KERNEL);
>>     if (rb) {
>> <<<<<<<<<<< * the sentence if (rb) can prevent reorder. Otherwise with
>> any API kmalloc/vmap/vmalloc and subsequent memory access, there will be
>> reorder issu. *
>>         kmemleak_not_leak(pages);
>>         rb->pages = pages;
>>         rb->nr_pages = nr_pages;
>>         return rb;
>>     }
>>
>>> prevents preceding accesses be reordered after pte_clear(). This
>> Can you give an example about such usage about flush_cache_vunmap()? and
>> we can continue to talk about it, else it is just guessing.
> Since we cannot reach a consensus, and the flush_cache_* API look very
> strange for this purpose (Yes, I know PowerPC does it like this, but
> ARM64 doesn't). I prefer to still use the ARM64 method which means add
> a dbar in set_pte(). Of course the performance will be a little worse,
> but still better than the old version, and it is more robust.
> 
> I know you are very busy, so if you have no time you don't need to
> send V3, I can just do a small modification on the 3rd patch.
No, I will send V3 by myself. And I will drop the this patch in this 
patchset since by actual test vmalloc_test works well even without this
patch on 3C5000 Dual-way, also weak function kernel_pte_init will be 
replaced with inline function rebased on
 
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/mm-define-general-function-pxd_init.patch

I dislike the copy-paste method without further understanding :(, 
although I also copy and paste code, but as least I try best to 
understand it.

Regards
Bibo Mao
> 
> 
> Huacai
> 
>>
>> Regards
>> Bibo Mao
>>> potential problem may not be seen from experiment, but it is needed in
>>> theory.
>>>
>>> Huacai
>>>
>>>> +
>>>>    #define cache_op(op, addr)                                             \
>>>>           __asm__ __volatile__(                                           \
>>>>           "       cacop   %0, %1                                  \n"     \
>>>> --
>>>> 2.39.3
>>>>
>>>>
>>
>>



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 2/3] LoongArch: Add barrier between set_pte and memory access
  2024-10-16  6:09         ` maobibo
@ 2024-10-16  7:30           ` Huacai Chen
  0 siblings, 0 replies; 21+ messages in thread
From: Huacai Chen @ 2024-10-16  7:30 UTC (permalink / raw)
  To: maobibo
  Cc: Andrey Ryabinin, Andrew Morton, David Hildenbrand, Barry Song,
	loongarch, linux-kernel, kasan-dev, linux-mm

On Wed, Oct 16, 2024 at 2:09 PM maobibo <maobibo@loongson.cn> wrote:
>
>
>
> On 2024/10/15 下午8:27, Huacai Chen wrote:
> > On Tue, Oct 15, 2024 at 10:54 AM maobibo <maobibo@loongson.cn> wrote:
> >>
> >>
> >>
> >> On 2024/10/14 下午2:31, Huacai Chen wrote:
> >>> Hi, Bibo,
> >>>
> >>> On Mon, Oct 14, 2024 at 11:59 AM Bibo Mao <maobibo@loongson.cn> wrote:
> >>>>
> >>>> It is possible to return a spurious fault if memory is accessed
> >>>> right after the pte is set. For user address space, pte is set
> >>>> in kernel space and memory is accessed in user space, there is
> >>>> long time for synchronization, no barrier needed. However for
> >>>> kernel address space, it is possible that memory is accessed
> >>>> right after the pte is set.
> >>>>
> >>>> Here flush_cache_vmap/flush_cache_vmap_early is used for
> >>>> synchronization.
> >>>>
> >>>> Signed-off-by: Bibo Mao <maobibo@loongson.cn>
> >>>> ---
> >>>>    arch/loongarch/include/asm/cacheflush.h | 14 +++++++++++++-
> >>>>    1 file changed, 13 insertions(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/arch/loongarch/include/asm/cacheflush.h b/arch/loongarch/include/asm/cacheflush.h
> >>>> index f8754d08a31a..53be231319ef 100644
> >>>> --- a/arch/loongarch/include/asm/cacheflush.h
> >>>> +++ b/arch/loongarch/include/asm/cacheflush.h
> >>>> @@ -42,12 +42,24 @@ void local_flush_icache_range(unsigned long start, unsigned long end);
> >>>>    #define flush_cache_dup_mm(mm)                         do { } while (0)
> >>>>    #define flush_cache_range(vma, start, end)             do { } while (0)
> >>>>    #define flush_cache_page(vma, vmaddr, pfn)             do { } while (0)
> >>>> -#define flush_cache_vmap(start, end)                   do { } while (0)
> >>>>    #define flush_cache_vunmap(start, end)                 do { } while (0)
> >>>>    #define flush_icache_user_page(vma, page, addr, len)   do { } while (0)
> >>>>    #define flush_dcache_mmap_lock(mapping)                        do { } while (0)
> >>>>    #define flush_dcache_mmap_unlock(mapping)              do { } while (0)
> >>>>
> >>>> +/*
> >>>> + * It is possible for a kernel virtual mapping access to return a spurious
> >>>> + * fault if it's accessed right after the pte is set. The page fault handler
> >>>> + * does not expect this type of fault. flush_cache_vmap is not exactly the
> >>>> + * right place to put this, but it seems to work well enough.
> >>>> + */
> >>>> +static inline void flush_cache_vmap(unsigned long start, unsigned long end)
> >>>> +{
> >>>> +       smp_mb();
> >>>> +}
> >>>> +#define flush_cache_vmap flush_cache_vmap
> >>>> +#define flush_cache_vmap_early flush_cache_vmap
> >>>   From the history of flush_cache_vmap_early(), It seems only archs with
> >>> "virtual cache" (VIVT or VIPT) need this API, so LoongArch can be a
> >>> no-op here.
> > OK,  flush_cache_vmap_early() also needs smp_mb().
> >
> >>
> >> Here is usage about flush_cache_vmap_early in file linux/mm/percpu.c,
> >> map the page and access it immediately. Do you think it should be noop
> >> on LoongArch.
> >>
> >> rc = __pcpu_map_pages(unit_addr, &pages[unit * unit_pages],
> >>                                        unit_pages);
> >> if (rc < 0)
> >>       panic("failed to map percpu area, err=%d\n", rc);
> >>       flush_cache_vmap_early(unit_addr, unit_addr + ai->unit_size);
> >>       /* copy static data */
> >>       memcpy((void *)unit_addr, __per_cpu_load, ai->static_size);
> >> }
> >>
> >>
> >>>
> >>> And I still think flush_cache_vunmap() should be a smp_mb(). A
> >>> smp_mb() in flush_cache_vmap() prevents subsequent accesses be
> >>> reordered before pte_set(), and a smp_mb() in flush_cache_vunmap()
> >> smp_mb() in flush_cache_vmap() does not prevent reorder. It is to flush
> >> pipeline and let page table walker HW sync with data cache.
> >>
> >> For the following example.
> >>     rb = vmap(pages, nr_meta_pages + 2 * nr_data_pages,
> >>                     VM_MAP | VM_USERMAP, PAGE_KERNEL);
> >>     if (rb) {
> >> <<<<<<<<<<< * the sentence if (rb) can prevent reorder. Otherwise with
> >> any API kmalloc/vmap/vmalloc and subsequent memory access, there will be
> >> reorder issu. *
> >>         kmemleak_not_leak(pages);
> >>         rb->pages = pages;
> >>         rb->nr_pages = nr_pages;
> >>         return rb;
> >>     }
> >>
> >>> prevents preceding accesses be reordered after pte_clear(). This
> >> Can you give an example about such usage about flush_cache_vunmap()? and
> >> we can continue to talk about it, else it is just guessing.
> > Since we cannot reach a consensus, and the flush_cache_* API look very
> > strange for this purpose (Yes, I know PowerPC does it like this, but
> > ARM64 doesn't). I prefer to still use the ARM64 method which means add
> > a dbar in set_pte(). Of course the performance will be a little worse,
> > but still better than the old version, and it is more robust.
> >
> > I know you are very busy, so if you have no time you don't need to
> > send V3, I can just do a small modification on the 3rd patch.
> No, I will send V3 by myself. And I will drop the this patch in this
> patchset since by actual test vmalloc_test works well even without this
> patch on 3C5000 Dual-way, also weak function kernel_pte_init will be
> replaced with inline function rebased on
>
> https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/mm-define-general-function-pxd_init.patch
This patch is in Andrew's mm-unstable branch. As far I know,
mm-unstable is for next (6.13) and mm-stable is for current (6.12).

But this series is bugfix, so it is for current (6.12).

>
> I dislike the copy-paste method without further understanding :(,
> although I also copy and paste code, but as least I try best to
> understand it.

I dislike too. But in order to make this series be in 6.12, it is
better to keep copy-paste, and then update the refactoring patch to V2
for Andrew (rebase and drop is normal for mm-unstable).


Huacai

>
> Regards
> Bibo Mao
> >
> >
> > Huacai
> >
> >>
> >> Regards
> >> Bibo Mao
> >>> potential problem may not be seen from experiment, but it is needed in
> >>> theory.
> >>>
> >>> Huacai
> >>>
> >>>> +
> >>>>    #define cache_op(op, addr)                                             \
> >>>>           __asm__ __volatile__(                                           \
> >>>>           "       cacop   %0, %1                                  \n"     \
> >>>> --
> >>>> 2.39.3
> >>>>
> >>>>
> >>
> >>
>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/3] LoongArch: Set initial pte entry with PAGE_GLOBAL for kernel space
  2024-10-14  3:58 ` [PATCH v2 1/3] LoongArch: Set initial pte entry with PAGE_GLOBAL for kernel space Bibo Mao
@ 2024-10-18  3:14   ` Huacai Chen
  2024-10-18  3:44     ` maobibo
  0 siblings, 1 reply; 21+ messages in thread
From: Huacai Chen @ 2024-10-18  3:14 UTC (permalink / raw)
  To: Bibo Mao
  Cc: Andrey Ryabinin, Andrew Morton, David Hildenbrand, Barry Song,
	loongarch, linux-kernel, kasan-dev, linux-mm

Hi, Bibo,

I applied this patch but drop the part of arch/loongarch/mm/kasan_init.c:
https://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson.git/commit/?h=loongarch-next&id=15832255e84494853f543b4c70ced50afc403067

Because kernel_pte_init() should operate on page-table pages, not on
data pages. You have already handle page-table page in
mm/kasan/init.c, and if we don't drop the modification on data pages
in arch/loongarch/mm/kasan_init.c, the kernel fail to boot if KASAN is
enabled.

Huacai

On Mon, Oct 14, 2024 at 11:59 AM Bibo Mao <maobibo@loongson.cn> wrote:
>
> Unlike general architectures, there are two pages in one TLB entry
> on LoongArch system. For kernel space, it requires both two pte
> entries with PAGE_GLOBAL bit set, else HW treats it as non-global
> tlb, there will be potential problems if tlb entry for kernel space
> is not global. Such as fail to flush kernel tlb with function
> local_flush_tlb_kernel_range() which only flush tlb with global bit.
>
> With function kernel_pte_init() added, it can be used to init pte
> table when it is created for kernel address space, and the default
> initial pte value is PAGE_GLOBAL rather than zero at beginning.
>
> Kernel address space areas includes fixmap, percpu, vmalloc, kasan
> and vmemmap areas set default pte entry with PAGE_GLOBAL set.
>
> Signed-off-by: Bibo Mao <maobibo@loongson.cn>
> ---
>  arch/loongarch/include/asm/pgalloc.h | 13 +++++++++++++
>  arch/loongarch/include/asm/pgtable.h |  1 +
>  arch/loongarch/mm/init.c             |  4 +++-
>  arch/loongarch/mm/kasan_init.c       |  4 +++-
>  arch/loongarch/mm/pgtable.c          | 22 ++++++++++++++++++++++
>  include/linux/mm.h                   |  1 +
>  mm/kasan/init.c                      |  8 +++++++-
>  mm/sparse-vmemmap.c                  |  5 +++++
>  8 files changed, 55 insertions(+), 3 deletions(-)
>
> diff --git a/arch/loongarch/include/asm/pgalloc.h b/arch/loongarch/include/asm/pgalloc.h
> index 4e2d6b7ca2ee..b2698c03dc2c 100644
> --- a/arch/loongarch/include/asm/pgalloc.h
> +++ b/arch/loongarch/include/asm/pgalloc.h
> @@ -10,8 +10,21 @@
>
>  #define __HAVE_ARCH_PMD_ALLOC_ONE
>  #define __HAVE_ARCH_PUD_ALLOC_ONE
> +#define __HAVE_ARCH_PTE_ALLOC_ONE_KERNEL
>  #include <asm-generic/pgalloc.h>
>
> +static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm)
> +{
> +       pte_t *pte;
> +
> +       pte = (pte_t *) __get_free_page(GFP_KERNEL);
> +       if (!pte)
> +               return NULL;
> +
> +       kernel_pte_init(pte);
> +       return pte;
> +}
> +
>  static inline void pmd_populate_kernel(struct mm_struct *mm,
>                                        pmd_t *pmd, pte_t *pte)
>  {
> diff --git a/arch/loongarch/include/asm/pgtable.h b/arch/loongarch/include/asm/pgtable.h
> index 9965f52ef65b..22e3a8f96213 100644
> --- a/arch/loongarch/include/asm/pgtable.h
> +++ b/arch/loongarch/include/asm/pgtable.h
> @@ -269,6 +269,7 @@ extern void set_pmd_at(struct mm_struct *mm, unsigned long addr, pmd_t *pmdp, pm
>  extern void pgd_init(void *addr);
>  extern void pud_init(void *addr);
>  extern void pmd_init(void *addr);
> +extern void kernel_pte_init(void *addr);
>
>  /*
>   * Encode/decode swap entries and swap PTEs. Swap PTEs are all PTEs that
> diff --git a/arch/loongarch/mm/init.c b/arch/loongarch/mm/init.c
> index 8a87a482c8f4..9f26e933a8a3 100644
> --- a/arch/loongarch/mm/init.c
> +++ b/arch/loongarch/mm/init.c
> @@ -198,9 +198,11 @@ pte_t * __init populate_kernel_pte(unsigned long addr)
>         if (!pmd_present(pmdp_get(pmd))) {
>                 pte_t *pte;
>
> -               pte = memblock_alloc(PAGE_SIZE, PAGE_SIZE);
> +               pte = memblock_alloc_raw(PAGE_SIZE, PAGE_SIZE);
>                 if (!pte)
>                         panic("%s: Failed to allocate memory\n", __func__);
> +
> +               kernel_pte_init(pte);
>                 pmd_populate_kernel(&init_mm, pmd, pte);
>         }
>
> diff --git a/arch/loongarch/mm/kasan_init.c b/arch/loongarch/mm/kasan_init.c
> index 427d6b1aec09..34988573b0d5 100644
> --- a/arch/loongarch/mm/kasan_init.c
> +++ b/arch/loongarch/mm/kasan_init.c
> @@ -152,6 +152,8 @@ static void __init kasan_pte_populate(pmd_t *pmdp, unsigned long addr,
>                 phys_addr_t page_phys = early ?
>                                         __pa_symbol(kasan_early_shadow_page)
>                                               : kasan_alloc_zeroed_page(node);
> +               if (!early)
> +                       kernel_pte_init(__va(page_phys));
>                 next = addr + PAGE_SIZE;
>                 set_pte(ptep, pfn_pte(__phys_to_pfn(page_phys), PAGE_KERNEL));
>         } while (ptep++, addr = next, addr != end && __pte_none(early, ptep_get(ptep)));
> @@ -287,7 +289,7 @@ void __init kasan_init(void)
>                 set_pte(&kasan_early_shadow_pte[i],
>                         pfn_pte(__phys_to_pfn(__pa_symbol(kasan_early_shadow_page)), PAGE_KERNEL_RO));
>
> -       memset(kasan_early_shadow_page, 0, PAGE_SIZE);
> +       kernel_pte_init(kasan_early_shadow_page);
>         csr_write64(__pa_symbol(swapper_pg_dir), LOONGARCH_CSR_PGDH);
>         local_flush_tlb_all();
>
> diff --git a/arch/loongarch/mm/pgtable.c b/arch/loongarch/mm/pgtable.c
> index eb6a29b491a7..228ffc1db0a3 100644
> --- a/arch/loongarch/mm/pgtable.c
> +++ b/arch/loongarch/mm/pgtable.c
> @@ -38,6 +38,28 @@ pgd_t *pgd_alloc(struct mm_struct *mm)
>  }
>  EXPORT_SYMBOL_GPL(pgd_alloc);
>
> +void kernel_pte_init(void *addr)
> +{
> +       unsigned long *p, *end;
> +       unsigned long entry;
> +
> +       entry = (unsigned long)_PAGE_GLOBAL;
> +       p = (unsigned long *)addr;
> +       end = p + PTRS_PER_PTE;
> +
> +       do {
> +               p[0] = entry;
> +               p[1] = entry;
> +               p[2] = entry;
> +               p[3] = entry;
> +               p[4] = entry;
> +               p += 8;
> +               p[-3] = entry;
> +               p[-2] = entry;
> +               p[-1] = entry;
> +       } while (p != end);
> +}
> +
>  void pgd_init(void *addr)
>  {
>         unsigned long *p, *end;
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index ecf63d2b0582..6909fe059a2c 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -3818,6 +3818,7 @@ void *sparse_buffer_alloc(unsigned long size);
>  struct page * __populate_section_memmap(unsigned long pfn,
>                 unsigned long nr_pages, int nid, struct vmem_altmap *altmap,
>                 struct dev_pagemap *pgmap);
> +void kernel_pte_init(void *addr);
>  void pmd_init(void *addr);
>  void pud_init(void *addr);
>  pgd_t *vmemmap_pgd_populate(unsigned long addr, int node);
> diff --git a/mm/kasan/init.c b/mm/kasan/init.c
> index 89895f38f722..ac607c306292 100644
> --- a/mm/kasan/init.c
> +++ b/mm/kasan/init.c
> @@ -106,6 +106,10 @@ static void __ref zero_pte_populate(pmd_t *pmd, unsigned long addr,
>         }
>  }
>
> +void __weak __meminit kernel_pte_init(void *addr)
> +{
> +}
> +
>  static int __ref zero_pmd_populate(pud_t *pud, unsigned long addr,
>                                 unsigned long end)
>  {
> @@ -126,8 +130,10 @@ static int __ref zero_pmd_populate(pud_t *pud, unsigned long addr,
>
>                         if (slab_is_available())
>                                 p = pte_alloc_one_kernel(&init_mm);
> -                       else
> +                       else {
>                                 p = early_alloc(PAGE_SIZE, NUMA_NO_NODE);
> +                               kernel_pte_init(p);
> +                       }
>                         if (!p)
>                                 return -ENOMEM;
>
> diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
> index edcc7a6b0f6f..c0388b2e959d 100644
> --- a/mm/sparse-vmemmap.c
> +++ b/mm/sparse-vmemmap.c
> @@ -184,6 +184,10 @@ static void * __meminit vmemmap_alloc_block_zero(unsigned long size, int node)
>         return p;
>  }
>
> +void __weak __meminit kernel_pte_init(void *addr)
> +{
> +}
> +
>  pmd_t * __meminit vmemmap_pmd_populate(pud_t *pud, unsigned long addr, int node)
>  {
>         pmd_t *pmd = pmd_offset(pud, addr);
> @@ -191,6 +195,7 @@ pmd_t * __meminit vmemmap_pmd_populate(pud_t *pud, unsigned long addr, int node)
>                 void *p = vmemmap_alloc_block_zero(PAGE_SIZE, node);
>                 if (!p)
>                         return NULL;
> +               kernel_pte_init(p);
>                 pmd_populate_kernel(&init_mm, pmd, p);
>         }
>         return pmd;
> --
> 2.39.3
>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/3] LoongArch: Set initial pte entry with PAGE_GLOBAL for kernel space
  2024-10-18  3:14   ` Huacai Chen
@ 2024-10-18  3:44     ` maobibo
  2024-10-18  4:11       ` Huacai Chen
  0 siblings, 1 reply; 21+ messages in thread
From: maobibo @ 2024-10-18  3:44 UTC (permalink / raw)
  To: Huacai Chen
  Cc: Andrey Ryabinin, Andrew Morton, David Hildenbrand, Barry Song,
	loongarch, linux-kernel, kasan-dev, linux-mm



On 2024/10/18 上午11:14, Huacai Chen wrote:
> Hi, Bibo,
> 
> I applied this patch but drop the part of arch/loongarch/mm/kasan_init.c:
> https://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson.git/commit/?h=loongarch-next&id=15832255e84494853f543b4c70ced50afc403067
> 
> Because kernel_pte_init() should operate on page-table pages, not on
> data pages. You have already handle page-table page in
> mm/kasan/init.c, and if we don't drop the modification on data pages
> in arch/loongarch/mm/kasan_init.c, the kernel fail to boot if KASAN is
> enabled.
> 
static inline void set_pte(pte_t *ptep, pte_t pteval)
  {
  	WRITE_ONCE(*ptep, pteval);
-
-	if (pte_val(pteval) & _PAGE_GLOBAL) {
-		pte_t *buddy = ptep_buddy(ptep);
-		/*
-		 * Make sure the buddy is global too (if it's !none,
-		 * it better already be global)
-		 */
-		if (pte_none(ptep_get(buddy))) {
-#ifdef CONFIG_SMP
-			/*
-			 * For SMP, multiple CPUs can race, so we need
-			 * to do this atomically.
-			 */
-			__asm__ __volatile__(
-			__AMOR "$zero, %[global], %[buddy] \n"
-			: [buddy] "+ZB" (buddy->pte)
-			: [global] "r" (_PAGE_GLOBAL)
-			: "memory");
-
-			DBAR(0b11000); /* o_wrw = 0b11000 */
-#else /* !CONFIG_SMP */
-			WRITE_ONCE(*buddy, __pte(pte_val(ptep_get(buddy)) | _PAGE_GLOBAL));
-#endif /* CONFIG_SMP */
-		}
-	}
+	DBAR(0b11000); /* o_wrw = 0b11000 */
  }

No, please hold on. This issue exists about twenty years, Do we need be 
in such a hurry now?

why is DBAR(0b11000) added in set_pte()?

Regards
Bibo Mao
> Huacai
> 
> On Mon, Oct 14, 2024 at 11:59 AM Bibo Mao <maobibo@loongson.cn> wrote:
>>
>> Unlike general architectures, there are two pages in one TLB entry
>> on LoongArch system. For kernel space, it requires both two pte
>> entries with PAGE_GLOBAL bit set, else HW treats it as non-global
>> tlb, there will be potential problems if tlb entry for kernel space
>> is not global. Such as fail to flush kernel tlb with function
>> local_flush_tlb_kernel_range() which only flush tlb with global bit.
>>
>> With function kernel_pte_init() added, it can be used to init pte
>> table when it is created for kernel address space, and the default
>> initial pte value is PAGE_GLOBAL rather than zero at beginning.
>>
>> Kernel address space areas includes fixmap, percpu, vmalloc, kasan
>> and vmemmap areas set default pte entry with PAGE_GLOBAL set.
>>
>> Signed-off-by: Bibo Mao <maobibo@loongson.cn>
>> ---
>>   arch/loongarch/include/asm/pgalloc.h | 13 +++++++++++++
>>   arch/loongarch/include/asm/pgtable.h |  1 +
>>   arch/loongarch/mm/init.c             |  4 +++-
>>   arch/loongarch/mm/kasan_init.c       |  4 +++-
>>   arch/loongarch/mm/pgtable.c          | 22 ++++++++++++++++++++++
>>   include/linux/mm.h                   |  1 +
>>   mm/kasan/init.c                      |  8 +++++++-
>>   mm/sparse-vmemmap.c                  |  5 +++++
>>   8 files changed, 55 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/loongarch/include/asm/pgalloc.h b/arch/loongarch/include/asm/pgalloc.h
>> index 4e2d6b7ca2ee..b2698c03dc2c 100644
>> --- a/arch/loongarch/include/asm/pgalloc.h
>> +++ b/arch/loongarch/include/asm/pgalloc.h
>> @@ -10,8 +10,21 @@
>>
>>   #define __HAVE_ARCH_PMD_ALLOC_ONE
>>   #define __HAVE_ARCH_PUD_ALLOC_ONE
>> +#define __HAVE_ARCH_PTE_ALLOC_ONE_KERNEL
>>   #include <asm-generic/pgalloc.h>
>>
>> +static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm)
>> +{
>> +       pte_t *pte;
>> +
>> +       pte = (pte_t *) __get_free_page(GFP_KERNEL);
>> +       if (!pte)
>> +               return NULL;
>> +
>> +       kernel_pte_init(pte);
>> +       return pte;
>> +}
>> +
>>   static inline void pmd_populate_kernel(struct mm_struct *mm,
>>                                         pmd_t *pmd, pte_t *pte)
>>   {
>> diff --git a/arch/loongarch/include/asm/pgtable.h b/arch/loongarch/include/asm/pgtable.h
>> index 9965f52ef65b..22e3a8f96213 100644
>> --- a/arch/loongarch/include/asm/pgtable.h
>> +++ b/arch/loongarch/include/asm/pgtable.h
>> @@ -269,6 +269,7 @@ extern void set_pmd_at(struct mm_struct *mm, unsigned long addr, pmd_t *pmdp, pm
>>   extern void pgd_init(void *addr);
>>   extern void pud_init(void *addr);
>>   extern void pmd_init(void *addr);
>> +extern void kernel_pte_init(void *addr);
>>
>>   /*
>>    * Encode/decode swap entries and swap PTEs. Swap PTEs are all PTEs that
>> diff --git a/arch/loongarch/mm/init.c b/arch/loongarch/mm/init.c
>> index 8a87a482c8f4..9f26e933a8a3 100644
>> --- a/arch/loongarch/mm/init.c
>> +++ b/arch/loongarch/mm/init.c
>> @@ -198,9 +198,11 @@ pte_t * __init populate_kernel_pte(unsigned long addr)
>>          if (!pmd_present(pmdp_get(pmd))) {
>>                  pte_t *pte;
>>
>> -               pte = memblock_alloc(PAGE_SIZE, PAGE_SIZE);
>> +               pte = memblock_alloc_raw(PAGE_SIZE, PAGE_SIZE);
>>                  if (!pte)
>>                          panic("%s: Failed to allocate memory\n", __func__);
>> +
>> +               kernel_pte_init(pte);
>>                  pmd_populate_kernel(&init_mm, pmd, pte);
>>          }
>>
>> diff --git a/arch/loongarch/mm/kasan_init.c b/arch/loongarch/mm/kasan_init.c
>> index 427d6b1aec09..34988573b0d5 100644
>> --- a/arch/loongarch/mm/kasan_init.c
>> +++ b/arch/loongarch/mm/kasan_init.c
>> @@ -152,6 +152,8 @@ static void __init kasan_pte_populate(pmd_t *pmdp, unsigned long addr,
>>                  phys_addr_t page_phys = early ?
>>                                          __pa_symbol(kasan_early_shadow_page)
>>                                                : kasan_alloc_zeroed_page(node);
>> +               if (!early)
>> +                       kernel_pte_init(__va(page_phys));
>>                  next = addr + PAGE_SIZE;
>>                  set_pte(ptep, pfn_pte(__phys_to_pfn(page_phys), PAGE_KERNEL));
>>          } while (ptep++, addr = next, addr != end && __pte_none(early, ptep_get(ptep)));
>> @@ -287,7 +289,7 @@ void __init kasan_init(void)
>>                  set_pte(&kasan_early_shadow_pte[i],
>>                          pfn_pte(__phys_to_pfn(__pa_symbol(kasan_early_shadow_page)), PAGE_KERNEL_RO));
>>
>> -       memset(kasan_early_shadow_page, 0, PAGE_SIZE);
>> +       kernel_pte_init(kasan_early_shadow_page);
>>          csr_write64(__pa_symbol(swapper_pg_dir), LOONGARCH_CSR_PGDH);
>>          local_flush_tlb_all();
>>
>> diff --git a/arch/loongarch/mm/pgtable.c b/arch/loongarch/mm/pgtable.c
>> index eb6a29b491a7..228ffc1db0a3 100644
>> --- a/arch/loongarch/mm/pgtable.c
>> +++ b/arch/loongarch/mm/pgtable.c
>> @@ -38,6 +38,28 @@ pgd_t *pgd_alloc(struct mm_struct *mm)
>>   }
>>   EXPORT_SYMBOL_GPL(pgd_alloc);
>>
>> +void kernel_pte_init(void *addr)
>> +{
>> +       unsigned long *p, *end;
>> +       unsigned long entry;
>> +
>> +       entry = (unsigned long)_PAGE_GLOBAL;
>> +       p = (unsigned long *)addr;
>> +       end = p + PTRS_PER_PTE;
>> +
>> +       do {
>> +               p[0] = entry;
>> +               p[1] = entry;
>> +               p[2] = entry;
>> +               p[3] = entry;
>> +               p[4] = entry;
>> +               p += 8;
>> +               p[-3] = entry;
>> +               p[-2] = entry;
>> +               p[-1] = entry;
>> +       } while (p != end);
>> +}
>> +
>>   void pgd_init(void *addr)
>>   {
>>          unsigned long *p, *end;
>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>> index ecf63d2b0582..6909fe059a2c 100644
>> --- a/include/linux/mm.h
>> +++ b/include/linux/mm.h
>> @@ -3818,6 +3818,7 @@ void *sparse_buffer_alloc(unsigned long size);
>>   struct page * __populate_section_memmap(unsigned long pfn,
>>                  unsigned long nr_pages, int nid, struct vmem_altmap *altmap,
>>                  struct dev_pagemap *pgmap);
>> +void kernel_pte_init(void *addr);
>>   void pmd_init(void *addr);
>>   void pud_init(void *addr);
>>   pgd_t *vmemmap_pgd_populate(unsigned long addr, int node);
>> diff --git a/mm/kasan/init.c b/mm/kasan/init.c
>> index 89895f38f722..ac607c306292 100644
>> --- a/mm/kasan/init.c
>> +++ b/mm/kasan/init.c
>> @@ -106,6 +106,10 @@ static void __ref zero_pte_populate(pmd_t *pmd, unsigned long addr,
>>          }
>>   }
>>
>> +void __weak __meminit kernel_pte_init(void *addr)
>> +{
>> +}
>> +
>>   static int __ref zero_pmd_populate(pud_t *pud, unsigned long addr,
>>                                  unsigned long end)
>>   {
>> @@ -126,8 +130,10 @@ static int __ref zero_pmd_populate(pud_t *pud, unsigned long addr,
>>
>>                          if (slab_is_available())
>>                                  p = pte_alloc_one_kernel(&init_mm);
>> -                       else
>> +                       else {
>>                                  p = early_alloc(PAGE_SIZE, NUMA_NO_NODE);
>> +                               kernel_pte_init(p);
>> +                       }
>>                          if (!p)
>>                                  return -ENOMEM;
>>
>> diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
>> index edcc7a6b0f6f..c0388b2e959d 100644
>> --- a/mm/sparse-vmemmap.c
>> +++ b/mm/sparse-vmemmap.c
>> @@ -184,6 +184,10 @@ static void * __meminit vmemmap_alloc_block_zero(unsigned long size, int node)
>>          return p;
>>   }
>>
>> +void __weak __meminit kernel_pte_init(void *addr)
>> +{
>> +}
>> +
>>   pmd_t * __meminit vmemmap_pmd_populate(pud_t *pud, unsigned long addr, int node)
>>   {
>>          pmd_t *pmd = pmd_offset(pud, addr);
>> @@ -191,6 +195,7 @@ pmd_t * __meminit vmemmap_pmd_populate(pud_t *pud, unsigned long addr, int node)
>>                  void *p = vmemmap_alloc_block_zero(PAGE_SIZE, node);
>>                  if (!p)
>>                          return NULL;
>> +               kernel_pte_init(p);
>>                  pmd_populate_kernel(&init_mm, pmd, p);
>>          }
>>          return pmd;
>> --
>> 2.39.3
>>



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/3] LoongArch: Set initial pte entry with PAGE_GLOBAL for kernel space
  2024-10-18  3:44     ` maobibo
@ 2024-10-18  4:11       ` Huacai Chen
  2024-10-18  4:16         ` maobibo
  0 siblings, 1 reply; 21+ messages in thread
From: Huacai Chen @ 2024-10-18  4:11 UTC (permalink / raw)
  To: maobibo
  Cc: Andrey Ryabinin, Andrew Morton, David Hildenbrand, Barry Song,
	loongarch, linux-kernel, kasan-dev, linux-mm

On Fri, Oct 18, 2024 at 11:44 AM maobibo <maobibo@loongson.cn> wrote:
>
>
>
> On 2024/10/18 上午11:14, Huacai Chen wrote:
> > Hi, Bibo,
> >
> > I applied this patch but drop the part of arch/loongarch/mm/kasan_init.c:
> > https://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson.git/commit/?h=loongarch-next&id=15832255e84494853f543b4c70ced50afc403067
> >
> > Because kernel_pte_init() should operate on page-table pages, not on
> > data pages. You have already handle page-table page in
> > mm/kasan/init.c, and if we don't drop the modification on data pages
> > in arch/loongarch/mm/kasan_init.c, the kernel fail to boot if KASAN is
> > enabled.
> >
> static inline void set_pte(pte_t *ptep, pte_t pteval)
>   {
>         WRITE_ONCE(*ptep, pteval);
> -
> -       if (pte_val(pteval) & _PAGE_GLOBAL) {
> -               pte_t *buddy = ptep_buddy(ptep);
> -               /*
> -                * Make sure the buddy is global too (if it's !none,
> -                * it better already be global)
> -                */
> -               if (pte_none(ptep_get(buddy))) {
> -#ifdef CONFIG_SMP
> -                       /*
> -                        * For SMP, multiple CPUs can race, so we need
> -                        * to do this atomically.
> -                        */
> -                       __asm__ __volatile__(
> -                       __AMOR "$zero, %[global], %[buddy] \n"
> -                       : [buddy] "+ZB" (buddy->pte)
> -                       : [global] "r" (_PAGE_GLOBAL)
> -                       : "memory");
> -
> -                       DBAR(0b11000); /* o_wrw = 0b11000 */
> -#else /* !CONFIG_SMP */
> -                       WRITE_ONCE(*buddy, __pte(pte_val(ptep_get(buddy)) | _PAGE_GLOBAL));
> -#endif /* CONFIG_SMP */
> -               }
> -       }
> +       DBAR(0b11000); /* o_wrw = 0b11000 */
>   }
>
> No, please hold on. This issue exists about twenty years, Do we need be
> in such a hurry now?
>
> why is DBAR(0b11000) added in set_pte()?
It exists before, not added by this patch. The reason is explained in
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.12-rc3&id=f93f67d06b1023313ef1662eac490e29c025c030

Huacai

>
> Regards
> Bibo Mao
> > Huacai
> >
> > On Mon, Oct 14, 2024 at 11:59 AM Bibo Mao <maobibo@loongson.cn> wrote:
> >>
> >> Unlike general architectures, there are two pages in one TLB entry
> >> on LoongArch system. For kernel space, it requires both two pte
> >> entries with PAGE_GLOBAL bit set, else HW treats it as non-global
> >> tlb, there will be potential problems if tlb entry for kernel space
> >> is not global. Such as fail to flush kernel tlb with function
> >> local_flush_tlb_kernel_range() which only flush tlb with global bit.
> >>
> >> With function kernel_pte_init() added, it can be used to init pte
> >> table when it is created for kernel address space, and the default
> >> initial pte value is PAGE_GLOBAL rather than zero at beginning.
> >>
> >> Kernel address space areas includes fixmap, percpu, vmalloc, kasan
> >> and vmemmap areas set default pte entry with PAGE_GLOBAL set.
> >>
> >> Signed-off-by: Bibo Mao <maobibo@loongson.cn>
> >> ---
> >>   arch/loongarch/include/asm/pgalloc.h | 13 +++++++++++++
> >>   arch/loongarch/include/asm/pgtable.h |  1 +
> >>   arch/loongarch/mm/init.c             |  4 +++-
> >>   arch/loongarch/mm/kasan_init.c       |  4 +++-
> >>   arch/loongarch/mm/pgtable.c          | 22 ++++++++++++++++++++++
> >>   include/linux/mm.h                   |  1 +
> >>   mm/kasan/init.c                      |  8 +++++++-
> >>   mm/sparse-vmemmap.c                  |  5 +++++
> >>   8 files changed, 55 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/arch/loongarch/include/asm/pgalloc.h b/arch/loongarch/include/asm/pgalloc.h
> >> index 4e2d6b7ca2ee..b2698c03dc2c 100644
> >> --- a/arch/loongarch/include/asm/pgalloc.h
> >> +++ b/arch/loongarch/include/asm/pgalloc.h
> >> @@ -10,8 +10,21 @@
> >>
> >>   #define __HAVE_ARCH_PMD_ALLOC_ONE
> >>   #define __HAVE_ARCH_PUD_ALLOC_ONE
> >> +#define __HAVE_ARCH_PTE_ALLOC_ONE_KERNEL
> >>   #include <asm-generic/pgalloc.h>
> >>
> >> +static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm)
> >> +{
> >> +       pte_t *pte;
> >> +
> >> +       pte = (pte_t *) __get_free_page(GFP_KERNEL);
> >> +       if (!pte)
> >> +               return NULL;
> >> +
> >> +       kernel_pte_init(pte);
> >> +       return pte;
> >> +}
> >> +
> >>   static inline void pmd_populate_kernel(struct mm_struct *mm,
> >>                                         pmd_t *pmd, pte_t *pte)
> >>   {
> >> diff --git a/arch/loongarch/include/asm/pgtable.h b/arch/loongarch/include/asm/pgtable.h
> >> index 9965f52ef65b..22e3a8f96213 100644
> >> --- a/arch/loongarch/include/asm/pgtable.h
> >> +++ b/arch/loongarch/include/asm/pgtable.h
> >> @@ -269,6 +269,7 @@ extern void set_pmd_at(struct mm_struct *mm, unsigned long addr, pmd_t *pmdp, pm
> >>   extern void pgd_init(void *addr);
> >>   extern void pud_init(void *addr);
> >>   extern void pmd_init(void *addr);
> >> +extern void kernel_pte_init(void *addr);
> >>
> >>   /*
> >>    * Encode/decode swap entries and swap PTEs. Swap PTEs are all PTEs that
> >> diff --git a/arch/loongarch/mm/init.c b/arch/loongarch/mm/init.c
> >> index 8a87a482c8f4..9f26e933a8a3 100644
> >> --- a/arch/loongarch/mm/init.c
> >> +++ b/arch/loongarch/mm/init.c
> >> @@ -198,9 +198,11 @@ pte_t * __init populate_kernel_pte(unsigned long addr)
> >>          if (!pmd_present(pmdp_get(pmd))) {
> >>                  pte_t *pte;
> >>
> >> -               pte = memblock_alloc(PAGE_SIZE, PAGE_SIZE);
> >> +               pte = memblock_alloc_raw(PAGE_SIZE, PAGE_SIZE);
> >>                  if (!pte)
> >>                          panic("%s: Failed to allocate memory\n", __func__);
> >> +
> >> +               kernel_pte_init(pte);
> >>                  pmd_populate_kernel(&init_mm, pmd, pte);
> >>          }
> >>
> >> diff --git a/arch/loongarch/mm/kasan_init.c b/arch/loongarch/mm/kasan_init.c
> >> index 427d6b1aec09..34988573b0d5 100644
> >> --- a/arch/loongarch/mm/kasan_init.c
> >> +++ b/arch/loongarch/mm/kasan_init.c
> >> @@ -152,6 +152,8 @@ static void __init kasan_pte_populate(pmd_t *pmdp, unsigned long addr,
> >>                  phys_addr_t page_phys = early ?
> >>                                          __pa_symbol(kasan_early_shadow_page)
> >>                                                : kasan_alloc_zeroed_page(node);
> >> +               if (!early)
> >> +                       kernel_pte_init(__va(page_phys));
> >>                  next = addr + PAGE_SIZE;
> >>                  set_pte(ptep, pfn_pte(__phys_to_pfn(page_phys), PAGE_KERNEL));
> >>          } while (ptep++, addr = next, addr != end && __pte_none(early, ptep_get(ptep)));
> >> @@ -287,7 +289,7 @@ void __init kasan_init(void)
> >>                  set_pte(&kasan_early_shadow_pte[i],
> >>                          pfn_pte(__phys_to_pfn(__pa_symbol(kasan_early_shadow_page)), PAGE_KERNEL_RO));
> >>
> >> -       memset(kasan_early_shadow_page, 0, PAGE_SIZE);
> >> +       kernel_pte_init(kasan_early_shadow_page);
> >>          csr_write64(__pa_symbol(swapper_pg_dir), LOONGARCH_CSR_PGDH);
> >>          local_flush_tlb_all();
> >>
> >> diff --git a/arch/loongarch/mm/pgtable.c b/arch/loongarch/mm/pgtable.c
> >> index eb6a29b491a7..228ffc1db0a3 100644
> >> --- a/arch/loongarch/mm/pgtable.c
> >> +++ b/arch/loongarch/mm/pgtable.c
> >> @@ -38,6 +38,28 @@ pgd_t *pgd_alloc(struct mm_struct *mm)
> >>   }
> >>   EXPORT_SYMBOL_GPL(pgd_alloc);
> >>
> >> +void kernel_pte_init(void *addr)
> >> +{
> >> +       unsigned long *p, *end;
> >> +       unsigned long entry;
> >> +
> >> +       entry = (unsigned long)_PAGE_GLOBAL;
> >> +       p = (unsigned long *)addr;
> >> +       end = p + PTRS_PER_PTE;
> >> +
> >> +       do {
> >> +               p[0] = entry;
> >> +               p[1] = entry;
> >> +               p[2] = entry;
> >> +               p[3] = entry;
> >> +               p[4] = entry;
> >> +               p += 8;
> >> +               p[-3] = entry;
> >> +               p[-2] = entry;
> >> +               p[-1] = entry;
> >> +       } while (p != end);
> >> +}
> >> +
> >>   void pgd_init(void *addr)
> >>   {
> >>          unsigned long *p, *end;
> >> diff --git a/include/linux/mm.h b/include/linux/mm.h
> >> index ecf63d2b0582..6909fe059a2c 100644
> >> --- a/include/linux/mm.h
> >> +++ b/include/linux/mm.h
> >> @@ -3818,6 +3818,7 @@ void *sparse_buffer_alloc(unsigned long size);
> >>   struct page * __populate_section_memmap(unsigned long pfn,
> >>                  unsigned long nr_pages, int nid, struct vmem_altmap *altmap,
> >>                  struct dev_pagemap *pgmap);
> >> +void kernel_pte_init(void *addr);
> >>   void pmd_init(void *addr);
> >>   void pud_init(void *addr);
> >>   pgd_t *vmemmap_pgd_populate(unsigned long addr, int node);
> >> diff --git a/mm/kasan/init.c b/mm/kasan/init.c
> >> index 89895f38f722..ac607c306292 100644
> >> --- a/mm/kasan/init.c
> >> +++ b/mm/kasan/init.c
> >> @@ -106,6 +106,10 @@ static void __ref zero_pte_populate(pmd_t *pmd, unsigned long addr,
> >>          }
> >>   }
> >>
> >> +void __weak __meminit kernel_pte_init(void *addr)
> >> +{
> >> +}
> >> +
> >>   static int __ref zero_pmd_populate(pud_t *pud, unsigned long addr,
> >>                                  unsigned long end)
> >>   {
> >> @@ -126,8 +130,10 @@ static int __ref zero_pmd_populate(pud_t *pud, unsigned long addr,
> >>
> >>                          if (slab_is_available())
> >>                                  p = pte_alloc_one_kernel(&init_mm);
> >> -                       else
> >> +                       else {
> >>                                  p = early_alloc(PAGE_SIZE, NUMA_NO_NODE);
> >> +                               kernel_pte_init(p);
> >> +                       }
> >>                          if (!p)
> >>                                  return -ENOMEM;
> >>
> >> diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
> >> index edcc7a6b0f6f..c0388b2e959d 100644
> >> --- a/mm/sparse-vmemmap.c
> >> +++ b/mm/sparse-vmemmap.c
> >> @@ -184,6 +184,10 @@ static void * __meminit vmemmap_alloc_block_zero(unsigned long size, int node)
> >>          return p;
> >>   }
> >>
> >> +void __weak __meminit kernel_pte_init(void *addr)
> >> +{
> >> +}
> >> +
> >>   pmd_t * __meminit vmemmap_pmd_populate(pud_t *pud, unsigned long addr, int node)
> >>   {
> >>          pmd_t *pmd = pmd_offset(pud, addr);
> >> @@ -191,6 +195,7 @@ pmd_t * __meminit vmemmap_pmd_populate(pud_t *pud, unsigned long addr, int node)
> >>                  void *p = vmemmap_alloc_block_zero(PAGE_SIZE, node);
> >>                  if (!p)
> >>                          return NULL;
> >> +               kernel_pte_init(p);
> >>                  pmd_populate_kernel(&init_mm, pmd, p);
> >>          }
> >>          return pmd;
> >> --
> >> 2.39.3
> >>
>
>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/3] LoongArch: Set initial pte entry with PAGE_GLOBAL for kernel space
  2024-10-18  4:11       ` Huacai Chen
@ 2024-10-18  4:16         ` maobibo
  2024-10-18  4:23           ` Huacai Chen
  0 siblings, 1 reply; 21+ messages in thread
From: maobibo @ 2024-10-18  4:16 UTC (permalink / raw)
  To: Huacai Chen
  Cc: Andrey Ryabinin, Andrew Morton, David Hildenbrand, Barry Song,
	loongarch, linux-kernel, kasan-dev, linux-mm



On 2024/10/18 下午12:11, Huacai Chen wrote:
> On Fri, Oct 18, 2024 at 11:44 AM maobibo <maobibo@loongson.cn> wrote:
>>
>>
>>
>> On 2024/10/18 上午11:14, Huacai Chen wrote:
>>> Hi, Bibo,
>>>
>>> I applied this patch but drop the part of arch/loongarch/mm/kasan_init.c:
>>> https://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson.git/commit/?h=loongarch-next&id=15832255e84494853f543b4c70ced50afc403067
>>>
>>> Because kernel_pte_init() should operate on page-table pages, not on
>>> data pages. You have already handle page-table page in
>>> mm/kasan/init.c, and if we don't drop the modification on data pages
>>> in arch/loongarch/mm/kasan_init.c, the kernel fail to boot if KASAN is
>>> enabled.
>>>
>> static inline void set_pte(pte_t *ptep, pte_t pteval)
>>    {
>>          WRITE_ONCE(*ptep, pteval);
>> -
>> -       if (pte_val(pteval) & _PAGE_GLOBAL) {
>> -               pte_t *buddy = ptep_buddy(ptep);
>> -               /*
>> -                * Make sure the buddy is global too (if it's !none,
>> -                * it better already be global)
>> -                */
>> -               if (pte_none(ptep_get(buddy))) {
>> -#ifdef CONFIG_SMP
>> -                       /*
>> -                        * For SMP, multiple CPUs can race, so we need
>> -                        * to do this atomically.
>> -                        */
>> -                       __asm__ __volatile__(
>> -                       __AMOR "$zero, %[global], %[buddy] \n"
>> -                       : [buddy] "+ZB" (buddy->pte)
>> -                       : [global] "r" (_PAGE_GLOBAL)
>> -                       : "memory");
>> -
>> -                       DBAR(0b11000); /* o_wrw = 0b11000 */
>> -#else /* !CONFIG_SMP */
>> -                       WRITE_ONCE(*buddy, __pte(pte_val(ptep_get(buddy)) | _PAGE_GLOBAL));
>> -#endif /* CONFIG_SMP */
>> -               }
>> -       }
>> +       DBAR(0b11000); /* o_wrw = 0b11000 */
>>    }
>>
>> No, please hold on. This issue exists about twenty years, Do we need be
>> in such a hurry now?
>>
>> why is DBAR(0b11000) added in set_pte()?
> It exists before, not added by this patch. The reason is explained in
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.12-rc3&id=f93f67d06b1023313ef1662eac490e29c025c030
why speculative accesses may cause spurious page fault in kernel space 
with PTE enabled?  speculative accesses exists anywhere, it does not 
cause spurious page fault.

Obvious you do not it and you write wrong patch.

> 
> Huacai
> 
>>
>> Regards
>> Bibo Mao
>>> Huacai
>>>
>>> On Mon, Oct 14, 2024 at 11:59 AM Bibo Mao <maobibo@loongson.cn> wrote:
>>>>
>>>> Unlike general architectures, there are two pages in one TLB entry
>>>> on LoongArch system. For kernel space, it requires both two pte
>>>> entries with PAGE_GLOBAL bit set, else HW treats it as non-global
>>>> tlb, there will be potential problems if tlb entry for kernel space
>>>> is not global. Such as fail to flush kernel tlb with function
>>>> local_flush_tlb_kernel_range() which only flush tlb with global bit.
>>>>
>>>> With function kernel_pte_init() added, it can be used to init pte
>>>> table when it is created for kernel address space, and the default
>>>> initial pte value is PAGE_GLOBAL rather than zero at beginning.
>>>>
>>>> Kernel address space areas includes fixmap, percpu, vmalloc, kasan
>>>> and vmemmap areas set default pte entry with PAGE_GLOBAL set.
>>>>
>>>> Signed-off-by: Bibo Mao <maobibo@loongson.cn>
>>>> ---
>>>>    arch/loongarch/include/asm/pgalloc.h | 13 +++++++++++++
>>>>    arch/loongarch/include/asm/pgtable.h |  1 +
>>>>    arch/loongarch/mm/init.c             |  4 +++-
>>>>    arch/loongarch/mm/kasan_init.c       |  4 +++-
>>>>    arch/loongarch/mm/pgtable.c          | 22 ++++++++++++++++++++++
>>>>    include/linux/mm.h                   |  1 +
>>>>    mm/kasan/init.c                      |  8 +++++++-
>>>>    mm/sparse-vmemmap.c                  |  5 +++++
>>>>    8 files changed, 55 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/arch/loongarch/include/asm/pgalloc.h b/arch/loongarch/include/asm/pgalloc.h
>>>> index 4e2d6b7ca2ee..b2698c03dc2c 100644
>>>> --- a/arch/loongarch/include/asm/pgalloc.h
>>>> +++ b/arch/loongarch/include/asm/pgalloc.h
>>>> @@ -10,8 +10,21 @@
>>>>
>>>>    #define __HAVE_ARCH_PMD_ALLOC_ONE
>>>>    #define __HAVE_ARCH_PUD_ALLOC_ONE
>>>> +#define __HAVE_ARCH_PTE_ALLOC_ONE_KERNEL
>>>>    #include <asm-generic/pgalloc.h>
>>>>
>>>> +static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm)
>>>> +{
>>>> +       pte_t *pte;
>>>> +
>>>> +       pte = (pte_t *) __get_free_page(GFP_KERNEL);
>>>> +       if (!pte)
>>>> +               return NULL;
>>>> +
>>>> +       kernel_pte_init(pte);
>>>> +       return pte;
>>>> +}
>>>> +
>>>>    static inline void pmd_populate_kernel(struct mm_struct *mm,
>>>>                                          pmd_t *pmd, pte_t *pte)
>>>>    {
>>>> diff --git a/arch/loongarch/include/asm/pgtable.h b/arch/loongarch/include/asm/pgtable.h
>>>> index 9965f52ef65b..22e3a8f96213 100644
>>>> --- a/arch/loongarch/include/asm/pgtable.h
>>>> +++ b/arch/loongarch/include/asm/pgtable.h
>>>> @@ -269,6 +269,7 @@ extern void set_pmd_at(struct mm_struct *mm, unsigned long addr, pmd_t *pmdp, pm
>>>>    extern void pgd_init(void *addr);
>>>>    extern void pud_init(void *addr);
>>>>    extern void pmd_init(void *addr);
>>>> +extern void kernel_pte_init(void *addr);
>>>>
>>>>    /*
>>>>     * Encode/decode swap entries and swap PTEs. Swap PTEs are all PTEs that
>>>> diff --git a/arch/loongarch/mm/init.c b/arch/loongarch/mm/init.c
>>>> index 8a87a482c8f4..9f26e933a8a3 100644
>>>> --- a/arch/loongarch/mm/init.c
>>>> +++ b/arch/loongarch/mm/init.c
>>>> @@ -198,9 +198,11 @@ pte_t * __init populate_kernel_pte(unsigned long addr)
>>>>           if (!pmd_present(pmdp_get(pmd))) {
>>>>                   pte_t *pte;
>>>>
>>>> -               pte = memblock_alloc(PAGE_SIZE, PAGE_SIZE);
>>>> +               pte = memblock_alloc_raw(PAGE_SIZE, PAGE_SIZE);
>>>>                   if (!pte)
>>>>                           panic("%s: Failed to allocate memory\n", __func__);
>>>> +
>>>> +               kernel_pte_init(pte);
>>>>                   pmd_populate_kernel(&init_mm, pmd, pte);
>>>>           }
>>>>
>>>> diff --git a/arch/loongarch/mm/kasan_init.c b/arch/loongarch/mm/kasan_init.c
>>>> index 427d6b1aec09..34988573b0d5 100644
>>>> --- a/arch/loongarch/mm/kasan_init.c
>>>> +++ b/arch/loongarch/mm/kasan_init.c
>>>> @@ -152,6 +152,8 @@ static void __init kasan_pte_populate(pmd_t *pmdp, unsigned long addr,
>>>>                   phys_addr_t page_phys = early ?
>>>>                                           __pa_symbol(kasan_early_shadow_page)
>>>>                                                 : kasan_alloc_zeroed_page(node);
>>>> +               if (!early)
>>>> +                       kernel_pte_init(__va(page_phys));
>>>>                   next = addr + PAGE_SIZE;
>>>>                   set_pte(ptep, pfn_pte(__phys_to_pfn(page_phys), PAGE_KERNEL));
>>>>           } while (ptep++, addr = next, addr != end && __pte_none(early, ptep_get(ptep)));
>>>> @@ -287,7 +289,7 @@ void __init kasan_init(void)
>>>>                   set_pte(&kasan_early_shadow_pte[i],
>>>>                           pfn_pte(__phys_to_pfn(__pa_symbol(kasan_early_shadow_page)), PAGE_KERNEL_RO));
>>>>
>>>> -       memset(kasan_early_shadow_page, 0, PAGE_SIZE);
>>>> +       kernel_pte_init(kasan_early_shadow_page);
>>>>           csr_write64(__pa_symbol(swapper_pg_dir), LOONGARCH_CSR_PGDH);
>>>>           local_flush_tlb_all();
>>>>
>>>> diff --git a/arch/loongarch/mm/pgtable.c b/arch/loongarch/mm/pgtable.c
>>>> index eb6a29b491a7..228ffc1db0a3 100644
>>>> --- a/arch/loongarch/mm/pgtable.c
>>>> +++ b/arch/loongarch/mm/pgtable.c
>>>> @@ -38,6 +38,28 @@ pgd_t *pgd_alloc(struct mm_struct *mm)
>>>>    }
>>>>    EXPORT_SYMBOL_GPL(pgd_alloc);
>>>>
>>>> +void kernel_pte_init(void *addr)
>>>> +{
>>>> +       unsigned long *p, *end;
>>>> +       unsigned long entry;
>>>> +
>>>> +       entry = (unsigned long)_PAGE_GLOBAL;
>>>> +       p = (unsigned long *)addr;
>>>> +       end = p + PTRS_PER_PTE;
>>>> +
>>>> +       do {
>>>> +               p[0] = entry;
>>>> +               p[1] = entry;
>>>> +               p[2] = entry;
>>>> +               p[3] = entry;
>>>> +               p[4] = entry;
>>>> +               p += 8;
>>>> +               p[-3] = entry;
>>>> +               p[-2] = entry;
>>>> +               p[-1] = entry;
>>>> +       } while (p != end);
>>>> +}
>>>> +
>>>>    void pgd_init(void *addr)
>>>>    {
>>>>           unsigned long *p, *end;
>>>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>>>> index ecf63d2b0582..6909fe059a2c 100644
>>>> --- a/include/linux/mm.h
>>>> +++ b/include/linux/mm.h
>>>> @@ -3818,6 +3818,7 @@ void *sparse_buffer_alloc(unsigned long size);
>>>>    struct page * __populate_section_memmap(unsigned long pfn,
>>>>                   unsigned long nr_pages, int nid, struct vmem_altmap *altmap,
>>>>                   struct dev_pagemap *pgmap);
>>>> +void kernel_pte_init(void *addr);
>>>>    void pmd_init(void *addr);
>>>>    void pud_init(void *addr);
>>>>    pgd_t *vmemmap_pgd_populate(unsigned long addr, int node);
>>>> diff --git a/mm/kasan/init.c b/mm/kasan/init.c
>>>> index 89895f38f722..ac607c306292 100644
>>>> --- a/mm/kasan/init.c
>>>> +++ b/mm/kasan/init.c
>>>> @@ -106,6 +106,10 @@ static void __ref zero_pte_populate(pmd_t *pmd, unsigned long addr,
>>>>           }
>>>>    }
>>>>
>>>> +void __weak __meminit kernel_pte_init(void *addr)
>>>> +{
>>>> +}
>>>> +
>>>>    static int __ref zero_pmd_populate(pud_t *pud, unsigned long addr,
>>>>                                   unsigned long end)
>>>>    {
>>>> @@ -126,8 +130,10 @@ static int __ref zero_pmd_populate(pud_t *pud, unsigned long addr,
>>>>
>>>>                           if (slab_is_available())
>>>>                                   p = pte_alloc_one_kernel(&init_mm);
>>>> -                       else
>>>> +                       else {
>>>>                                   p = early_alloc(PAGE_SIZE, NUMA_NO_NODE);
>>>> +                               kernel_pte_init(p);
>>>> +                       }
>>>>                           if (!p)
>>>>                                   return -ENOMEM;
>>>>
>>>> diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
>>>> index edcc7a6b0f6f..c0388b2e959d 100644
>>>> --- a/mm/sparse-vmemmap.c
>>>> +++ b/mm/sparse-vmemmap.c
>>>> @@ -184,6 +184,10 @@ static void * __meminit vmemmap_alloc_block_zero(unsigned long size, int node)
>>>>           return p;
>>>>    }
>>>>
>>>> +void __weak __meminit kernel_pte_init(void *addr)
>>>> +{
>>>> +}
>>>> +
>>>>    pmd_t * __meminit vmemmap_pmd_populate(pud_t *pud, unsigned long addr, int node)
>>>>    {
>>>>           pmd_t *pmd = pmd_offset(pud, addr);
>>>> @@ -191,6 +195,7 @@ pmd_t * __meminit vmemmap_pmd_populate(pud_t *pud, unsigned long addr, int node)
>>>>                   void *p = vmemmap_alloc_block_zero(PAGE_SIZE, node);
>>>>                   if (!p)
>>>>                           return NULL;
>>>> +               kernel_pte_init(p);
>>>>                   pmd_populate_kernel(&init_mm, pmd, p);
>>>>           }
>>>>           return pmd;
>>>> --
>>>> 2.39.3
>>>>
>>
>>



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/3] LoongArch: Set initial pte entry with PAGE_GLOBAL for kernel space
  2024-10-18  4:16         ` maobibo
@ 2024-10-18  4:23           ` Huacai Chen
  2024-10-18  6:23             ` maobibo
  0 siblings, 1 reply; 21+ messages in thread
From: Huacai Chen @ 2024-10-18  4:23 UTC (permalink / raw)
  To: maobibo, wuruiyang
  Cc: Andrey Ryabinin, Andrew Morton, David Hildenbrand, Barry Song,
	loongarch, linux-kernel, kasan-dev, linux-mm

On Fri, Oct 18, 2024 at 12:16 PM maobibo <maobibo@loongson.cn> wrote:
>
>
>
> On 2024/10/18 下午12:11, Huacai Chen wrote:
> > On Fri, Oct 18, 2024 at 11:44 AM maobibo <maobibo@loongson.cn> wrote:
> >>
> >>
> >>
> >> On 2024/10/18 上午11:14, Huacai Chen wrote:
> >>> Hi, Bibo,
> >>>
> >>> I applied this patch but drop the part of arch/loongarch/mm/kasan_init.c:
> >>> https://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson.git/commit/?h=loongarch-next&id=15832255e84494853f543b4c70ced50afc403067
> >>>
> >>> Because kernel_pte_init() should operate on page-table pages, not on
> >>> data pages. You have already handle page-table page in
> >>> mm/kasan/init.c, and if we don't drop the modification on data pages
> >>> in arch/loongarch/mm/kasan_init.c, the kernel fail to boot if KASAN is
> >>> enabled.
> >>>
> >> static inline void set_pte(pte_t *ptep, pte_t pteval)
> >>    {
> >>          WRITE_ONCE(*ptep, pteval);
> >> -
> >> -       if (pte_val(pteval) & _PAGE_GLOBAL) {
> >> -               pte_t *buddy = ptep_buddy(ptep);
> >> -               /*
> >> -                * Make sure the buddy is global too (if it's !none,
> >> -                * it better already be global)
> >> -                */
> >> -               if (pte_none(ptep_get(buddy))) {
> >> -#ifdef CONFIG_SMP
> >> -                       /*
> >> -                        * For SMP, multiple CPUs can race, so we need
> >> -                        * to do this atomically.
> >> -                        */
> >> -                       __asm__ __volatile__(
> >> -                       __AMOR "$zero, %[global], %[buddy] \n"
> >> -                       : [buddy] "+ZB" (buddy->pte)
> >> -                       : [global] "r" (_PAGE_GLOBAL)
> >> -                       : "memory");
> >> -
> >> -                       DBAR(0b11000); /* o_wrw = 0b11000 */
> >> -#else /* !CONFIG_SMP */
> >> -                       WRITE_ONCE(*buddy, __pte(pte_val(ptep_get(buddy)) | _PAGE_GLOBAL));
> >> -#endif /* CONFIG_SMP */
> >> -               }
> >> -       }
> >> +       DBAR(0b11000); /* o_wrw = 0b11000 */
> >>    }
> >>
> >> No, please hold on. This issue exists about twenty years, Do we need be
> >> in such a hurry now?
> >>
> >> why is DBAR(0b11000) added in set_pte()?
> > It exists before, not added by this patch. The reason is explained in
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.12-rc3&id=f93f67d06b1023313ef1662eac490e29c025c030
> why speculative accesses may cause spurious page fault in kernel space
> with PTE enabled?  speculative accesses exists anywhere, it does not
> cause spurious page fault.
Confirmed by Ruiyang Wu, and even if DBAR(0b11000) is wrong, that
means another patch's mistake, not this one. This one just keeps the
old behavior.
+CC Ruiyang Wu here.

Huacai

>
> Obvious you do not it and you write wrong patch.
>
> >
> > Huacai
> >
> >>
> >> Regards
> >> Bibo Mao
> >>> Huacai
> >>>
> >>> On Mon, Oct 14, 2024 at 11:59 AM Bibo Mao <maobibo@loongson.cn> wrote:
> >>>>
> >>>> Unlike general architectures, there are two pages in one TLB entry
> >>>> on LoongArch system. For kernel space, it requires both two pte
> >>>> entries with PAGE_GLOBAL bit set, else HW treats it as non-global
> >>>> tlb, there will be potential problems if tlb entry for kernel space
> >>>> is not global. Such as fail to flush kernel tlb with function
> >>>> local_flush_tlb_kernel_range() which only flush tlb with global bit.
> >>>>
> >>>> With function kernel_pte_init() added, it can be used to init pte
> >>>> table when it is created for kernel address space, and the default
> >>>> initial pte value is PAGE_GLOBAL rather than zero at beginning.
> >>>>
> >>>> Kernel address space areas includes fixmap, percpu, vmalloc, kasan
> >>>> and vmemmap areas set default pte entry with PAGE_GLOBAL set.
> >>>>
> >>>> Signed-off-by: Bibo Mao <maobibo@loongson.cn>
> >>>> ---
> >>>>    arch/loongarch/include/asm/pgalloc.h | 13 +++++++++++++
> >>>>    arch/loongarch/include/asm/pgtable.h |  1 +
> >>>>    arch/loongarch/mm/init.c             |  4 +++-
> >>>>    arch/loongarch/mm/kasan_init.c       |  4 +++-
> >>>>    arch/loongarch/mm/pgtable.c          | 22 ++++++++++++++++++++++
> >>>>    include/linux/mm.h                   |  1 +
> >>>>    mm/kasan/init.c                      |  8 +++++++-
> >>>>    mm/sparse-vmemmap.c                  |  5 +++++
> >>>>    8 files changed, 55 insertions(+), 3 deletions(-)
> >>>>
> >>>> diff --git a/arch/loongarch/include/asm/pgalloc.h b/arch/loongarch/include/asm/pgalloc.h
> >>>> index 4e2d6b7ca2ee..b2698c03dc2c 100644
> >>>> --- a/arch/loongarch/include/asm/pgalloc.h
> >>>> +++ b/arch/loongarch/include/asm/pgalloc.h
> >>>> @@ -10,8 +10,21 @@
> >>>>
> >>>>    #define __HAVE_ARCH_PMD_ALLOC_ONE
> >>>>    #define __HAVE_ARCH_PUD_ALLOC_ONE
> >>>> +#define __HAVE_ARCH_PTE_ALLOC_ONE_KERNEL
> >>>>    #include <asm-generic/pgalloc.h>
> >>>>
> >>>> +static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm)
> >>>> +{
> >>>> +       pte_t *pte;
> >>>> +
> >>>> +       pte = (pte_t *) __get_free_page(GFP_KERNEL);
> >>>> +       if (!pte)
> >>>> +               return NULL;
> >>>> +
> >>>> +       kernel_pte_init(pte);
> >>>> +       return pte;
> >>>> +}
> >>>> +
> >>>>    static inline void pmd_populate_kernel(struct mm_struct *mm,
> >>>>                                          pmd_t *pmd, pte_t *pte)
> >>>>    {
> >>>> diff --git a/arch/loongarch/include/asm/pgtable.h b/arch/loongarch/include/asm/pgtable.h
> >>>> index 9965f52ef65b..22e3a8f96213 100644
> >>>> --- a/arch/loongarch/include/asm/pgtable.h
> >>>> +++ b/arch/loongarch/include/asm/pgtable.h
> >>>> @@ -269,6 +269,7 @@ extern void set_pmd_at(struct mm_struct *mm, unsigned long addr, pmd_t *pmdp, pm
> >>>>    extern void pgd_init(void *addr);
> >>>>    extern void pud_init(void *addr);
> >>>>    extern void pmd_init(void *addr);
> >>>> +extern void kernel_pte_init(void *addr);
> >>>>
> >>>>    /*
> >>>>     * Encode/decode swap entries and swap PTEs. Swap PTEs are all PTEs that
> >>>> diff --git a/arch/loongarch/mm/init.c b/arch/loongarch/mm/init.c
> >>>> index 8a87a482c8f4..9f26e933a8a3 100644
> >>>> --- a/arch/loongarch/mm/init.c
> >>>> +++ b/arch/loongarch/mm/init.c
> >>>> @@ -198,9 +198,11 @@ pte_t * __init populate_kernel_pte(unsigned long addr)
> >>>>           if (!pmd_present(pmdp_get(pmd))) {
> >>>>                   pte_t *pte;
> >>>>
> >>>> -               pte = memblock_alloc(PAGE_SIZE, PAGE_SIZE);
> >>>> +               pte = memblock_alloc_raw(PAGE_SIZE, PAGE_SIZE);
> >>>>                   if (!pte)
> >>>>                           panic("%s: Failed to allocate memory\n", __func__);
> >>>> +
> >>>> +               kernel_pte_init(pte);
> >>>>                   pmd_populate_kernel(&init_mm, pmd, pte);
> >>>>           }
> >>>>
> >>>> diff --git a/arch/loongarch/mm/kasan_init.c b/arch/loongarch/mm/kasan_init.c
> >>>> index 427d6b1aec09..34988573b0d5 100644
> >>>> --- a/arch/loongarch/mm/kasan_init.c
> >>>> +++ b/arch/loongarch/mm/kasan_init.c
> >>>> @@ -152,6 +152,8 @@ static void __init kasan_pte_populate(pmd_t *pmdp, unsigned long addr,
> >>>>                   phys_addr_t page_phys = early ?
> >>>>                                           __pa_symbol(kasan_early_shadow_page)
> >>>>                                                 : kasan_alloc_zeroed_page(node);
> >>>> +               if (!early)
> >>>> +                       kernel_pte_init(__va(page_phys));
> >>>>                   next = addr + PAGE_SIZE;
> >>>>                   set_pte(ptep, pfn_pte(__phys_to_pfn(page_phys), PAGE_KERNEL));
> >>>>           } while (ptep++, addr = next, addr != end && __pte_none(early, ptep_get(ptep)));
> >>>> @@ -287,7 +289,7 @@ void __init kasan_init(void)
> >>>>                   set_pte(&kasan_early_shadow_pte[i],
> >>>>                           pfn_pte(__phys_to_pfn(__pa_symbol(kasan_early_shadow_page)), PAGE_KERNEL_RO));
> >>>>
> >>>> -       memset(kasan_early_shadow_page, 0, PAGE_SIZE);
> >>>> +       kernel_pte_init(kasan_early_shadow_page);
> >>>>           csr_write64(__pa_symbol(swapper_pg_dir), LOONGARCH_CSR_PGDH);
> >>>>           local_flush_tlb_all();
> >>>>
> >>>> diff --git a/arch/loongarch/mm/pgtable.c b/arch/loongarch/mm/pgtable.c
> >>>> index eb6a29b491a7..228ffc1db0a3 100644
> >>>> --- a/arch/loongarch/mm/pgtable.c
> >>>> +++ b/arch/loongarch/mm/pgtable.c
> >>>> @@ -38,6 +38,28 @@ pgd_t *pgd_alloc(struct mm_struct *mm)
> >>>>    }
> >>>>    EXPORT_SYMBOL_GPL(pgd_alloc);
> >>>>
> >>>> +void kernel_pte_init(void *addr)
> >>>> +{
> >>>> +       unsigned long *p, *end;
> >>>> +       unsigned long entry;
> >>>> +
> >>>> +       entry = (unsigned long)_PAGE_GLOBAL;
> >>>> +       p = (unsigned long *)addr;
> >>>> +       end = p + PTRS_PER_PTE;
> >>>> +
> >>>> +       do {
> >>>> +               p[0] = entry;
> >>>> +               p[1] = entry;
> >>>> +               p[2] = entry;
> >>>> +               p[3] = entry;
> >>>> +               p[4] = entry;
> >>>> +               p += 8;
> >>>> +               p[-3] = entry;
> >>>> +               p[-2] = entry;
> >>>> +               p[-1] = entry;
> >>>> +       } while (p != end);
> >>>> +}
> >>>> +
> >>>>    void pgd_init(void *addr)
> >>>>    {
> >>>>           unsigned long *p, *end;
> >>>> diff --git a/include/linux/mm.h b/include/linux/mm.h
> >>>> index ecf63d2b0582..6909fe059a2c 100644
> >>>> --- a/include/linux/mm.h
> >>>> +++ b/include/linux/mm.h
> >>>> @@ -3818,6 +3818,7 @@ void *sparse_buffer_alloc(unsigned long size);
> >>>>    struct page * __populate_section_memmap(unsigned long pfn,
> >>>>                   unsigned long nr_pages, int nid, struct vmem_altmap *altmap,
> >>>>                   struct dev_pagemap *pgmap);
> >>>> +void kernel_pte_init(void *addr);
> >>>>    void pmd_init(void *addr);
> >>>>    void pud_init(void *addr);
> >>>>    pgd_t *vmemmap_pgd_populate(unsigned long addr, int node);
> >>>> diff --git a/mm/kasan/init.c b/mm/kasan/init.c
> >>>> index 89895f38f722..ac607c306292 100644
> >>>> --- a/mm/kasan/init.c
> >>>> +++ b/mm/kasan/init.c
> >>>> @@ -106,6 +106,10 @@ static void __ref zero_pte_populate(pmd_t *pmd, unsigned long addr,
> >>>>           }
> >>>>    }
> >>>>
> >>>> +void __weak __meminit kernel_pte_init(void *addr)
> >>>> +{
> >>>> +}
> >>>> +
> >>>>    static int __ref zero_pmd_populate(pud_t *pud, unsigned long addr,
> >>>>                                   unsigned long end)
> >>>>    {
> >>>> @@ -126,8 +130,10 @@ static int __ref zero_pmd_populate(pud_t *pud, unsigned long addr,
> >>>>
> >>>>                           if (slab_is_available())
> >>>>                                   p = pte_alloc_one_kernel(&init_mm);
> >>>> -                       else
> >>>> +                       else {
> >>>>                                   p = early_alloc(PAGE_SIZE, NUMA_NO_NODE);
> >>>> +                               kernel_pte_init(p);
> >>>> +                       }
> >>>>                           if (!p)
> >>>>                                   return -ENOMEM;
> >>>>
> >>>> diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
> >>>> index edcc7a6b0f6f..c0388b2e959d 100644
> >>>> --- a/mm/sparse-vmemmap.c
> >>>> +++ b/mm/sparse-vmemmap.c
> >>>> @@ -184,6 +184,10 @@ static void * __meminit vmemmap_alloc_block_zero(unsigned long size, int node)
> >>>>           return p;
> >>>>    }
> >>>>
> >>>> +void __weak __meminit kernel_pte_init(void *addr)
> >>>> +{
> >>>> +}
> >>>> +
> >>>>    pmd_t * __meminit vmemmap_pmd_populate(pud_t *pud, unsigned long addr, int node)
> >>>>    {
> >>>>           pmd_t *pmd = pmd_offset(pud, addr);
> >>>> @@ -191,6 +195,7 @@ pmd_t * __meminit vmemmap_pmd_populate(pud_t *pud, unsigned long addr, int node)
> >>>>                   void *p = vmemmap_alloc_block_zero(PAGE_SIZE, node);
> >>>>                   if (!p)
> >>>>                           return NULL;
> >>>> +               kernel_pte_init(p);
> >>>>                   pmd_populate_kernel(&init_mm, pmd, p);
> >>>>           }
> >>>>           return pmd;
> >>>> --
> >>>> 2.39.3
> >>>>
> >>
> >>
>
>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/3] LoongArch: Set initial pte entry with PAGE_GLOBAL for kernel space
  2024-10-18  4:23           ` Huacai Chen
@ 2024-10-18  6:23             ` maobibo
  2024-10-18  6:32               ` Huacai Chen
  0 siblings, 1 reply; 21+ messages in thread
From: maobibo @ 2024-10-18  6:23 UTC (permalink / raw)
  To: Huacai Chen, wuruiyang
  Cc: Andrey Ryabinin, Andrew Morton, David Hildenbrand, Barry Song,
	loongarch, linux-kernel, kasan-dev, linux-mm



On 2024/10/18 下午12:23, Huacai Chen wrote:
> On Fri, Oct 18, 2024 at 12:16 PM maobibo <maobibo@loongson.cn> wrote:
>>
>>
>>
>> On 2024/10/18 下午12:11, Huacai Chen wrote:
>>> On Fri, Oct 18, 2024 at 11:44 AM maobibo <maobibo@loongson.cn> wrote:
>>>>
>>>>
>>>>
>>>> On 2024/10/18 上午11:14, Huacai Chen wrote:
>>>>> Hi, Bibo,
>>>>>
>>>>> I applied this patch but drop the part of arch/loongarch/mm/kasan_init.c:
>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson.git/commit/?h=loongarch-next&id=15832255e84494853f543b4c70ced50afc403067
>>>>>
>>>>> Because kernel_pte_init() should operate on page-table pages, not on
>>>>> data pages. You have already handle page-table page in
>>>>> mm/kasan/init.c, and if we don't drop the modification on data pages
>>>>> in arch/loongarch/mm/kasan_init.c, the kernel fail to boot if KASAN is
>>>>> enabled.
>>>>>
>>>> static inline void set_pte(pte_t *ptep, pte_t pteval)
>>>>     {
>>>>           WRITE_ONCE(*ptep, pteval);
>>>> -
>>>> -       if (pte_val(pteval) & _PAGE_GLOBAL) {
>>>> -               pte_t *buddy = ptep_buddy(ptep);
>>>> -               /*
>>>> -                * Make sure the buddy is global too (if it's !none,
>>>> -                * it better already be global)
>>>> -                */
>>>> -               if (pte_none(ptep_get(buddy))) {
>>>> -#ifdef CONFIG_SMP
>>>> -                       /*
>>>> -                        * For SMP, multiple CPUs can race, so we need
>>>> -                        * to do this atomically.
>>>> -                        */
>>>> -                       __asm__ __volatile__(
>>>> -                       __AMOR "$zero, %[global], %[buddy] \n"
>>>> -                       : [buddy] "+ZB" (buddy->pte)
>>>> -                       : [global] "r" (_PAGE_GLOBAL)
>>>> -                       : "memory");
>>>> -
>>>> -                       DBAR(0b11000); /* o_wrw = 0b11000 */
>>>> -#else /* !CONFIG_SMP */
>>>> -                       WRITE_ONCE(*buddy, __pte(pte_val(ptep_get(buddy)) | _PAGE_GLOBAL));
>>>> -#endif /* CONFIG_SMP */
>>>> -               }
>>>> -       }
>>>> +       DBAR(0b11000); /* o_wrw = 0b11000 */
>>>>     }
>>>>
>>>> No, please hold on. This issue exists about twenty years, Do we need be
>>>> in such a hurry now?
>>>>
>>>> why is DBAR(0b11000) added in set_pte()?
>>> It exists before, not added by this patch. The reason is explained in
>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.12-rc3&id=f93f67d06b1023313ef1662eac490e29c025c030
>> why speculative accesses may cause spurious page fault in kernel space
>> with PTE enabled?  speculative accesses exists anywhere, it does not
>> cause spurious page fault.
> Confirmed by Ruiyang Wu, and even if DBAR(0b11000) is wrong, that
> means another patch's mistake, not this one. This one just keeps the
> old behavior.
> +CC Ruiyang Wu here.
Also from Ruiyang Wu, the information is that speculative accesses may 
insert stale TLB, however no page fault exception.

So adding barrier in set_pte() does not prevent speculative accesses. 
And you write patch here, however do not know the actual reason?

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.12-rc3&id=f93f67d06b1023313ef1662eac490e29c025c030

Bibo Mao
> 
> Huacai
> 
>>
>> Obvious you do not it and you write wrong patch.
>>
>>>
>>> Huacai
>>>
>>>>
>>>> Regards
>>>> Bibo Mao
>>>>> Huacai
>>>>>
>>>>> On Mon, Oct 14, 2024 at 11:59 AM Bibo Mao <maobibo@loongson.cn> wrote:
>>>>>>
>>>>>> Unlike general architectures, there are two pages in one TLB entry
>>>>>> on LoongArch system. For kernel space, it requires both two pte
>>>>>> entries with PAGE_GLOBAL bit set, else HW treats it as non-global
>>>>>> tlb, there will be potential problems if tlb entry for kernel space
>>>>>> is not global. Such as fail to flush kernel tlb with function
>>>>>> local_flush_tlb_kernel_range() which only flush tlb with global bit.
>>>>>>
>>>>>> With function kernel_pte_init() added, it can be used to init pte
>>>>>> table when it is created for kernel address space, and the default
>>>>>> initial pte value is PAGE_GLOBAL rather than zero at beginning.
>>>>>>
>>>>>> Kernel address space areas includes fixmap, percpu, vmalloc, kasan
>>>>>> and vmemmap areas set default pte entry with PAGE_GLOBAL set.
>>>>>>
>>>>>> Signed-off-by: Bibo Mao <maobibo@loongson.cn>
>>>>>> ---
>>>>>>     arch/loongarch/include/asm/pgalloc.h | 13 +++++++++++++
>>>>>>     arch/loongarch/include/asm/pgtable.h |  1 +
>>>>>>     arch/loongarch/mm/init.c             |  4 +++-
>>>>>>     arch/loongarch/mm/kasan_init.c       |  4 +++-
>>>>>>     arch/loongarch/mm/pgtable.c          | 22 ++++++++++++++++++++++
>>>>>>     include/linux/mm.h                   |  1 +
>>>>>>     mm/kasan/init.c                      |  8 +++++++-
>>>>>>     mm/sparse-vmemmap.c                  |  5 +++++
>>>>>>     8 files changed, 55 insertions(+), 3 deletions(-)
>>>>>>
>>>>>> diff --git a/arch/loongarch/include/asm/pgalloc.h b/arch/loongarch/include/asm/pgalloc.h
>>>>>> index 4e2d6b7ca2ee..b2698c03dc2c 100644
>>>>>> --- a/arch/loongarch/include/asm/pgalloc.h
>>>>>> +++ b/arch/loongarch/include/asm/pgalloc.h
>>>>>> @@ -10,8 +10,21 @@
>>>>>>
>>>>>>     #define __HAVE_ARCH_PMD_ALLOC_ONE
>>>>>>     #define __HAVE_ARCH_PUD_ALLOC_ONE
>>>>>> +#define __HAVE_ARCH_PTE_ALLOC_ONE_KERNEL
>>>>>>     #include <asm-generic/pgalloc.h>
>>>>>>
>>>>>> +static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm)
>>>>>> +{
>>>>>> +       pte_t *pte;
>>>>>> +
>>>>>> +       pte = (pte_t *) __get_free_page(GFP_KERNEL);
>>>>>> +       if (!pte)
>>>>>> +               return NULL;
>>>>>> +
>>>>>> +       kernel_pte_init(pte);
>>>>>> +       return pte;
>>>>>> +}
>>>>>> +
>>>>>>     static inline void pmd_populate_kernel(struct mm_struct *mm,
>>>>>>                                           pmd_t *pmd, pte_t *pte)
>>>>>>     {
>>>>>> diff --git a/arch/loongarch/include/asm/pgtable.h b/arch/loongarch/include/asm/pgtable.h
>>>>>> index 9965f52ef65b..22e3a8f96213 100644
>>>>>> --- a/arch/loongarch/include/asm/pgtable.h
>>>>>> +++ b/arch/loongarch/include/asm/pgtable.h
>>>>>> @@ -269,6 +269,7 @@ extern void set_pmd_at(struct mm_struct *mm, unsigned long addr, pmd_t *pmdp, pm
>>>>>>     extern void pgd_init(void *addr);
>>>>>>     extern void pud_init(void *addr);
>>>>>>     extern void pmd_init(void *addr);
>>>>>> +extern void kernel_pte_init(void *addr);
>>>>>>
>>>>>>     /*
>>>>>>      * Encode/decode swap entries and swap PTEs. Swap PTEs are all PTEs that
>>>>>> diff --git a/arch/loongarch/mm/init.c b/arch/loongarch/mm/init.c
>>>>>> index 8a87a482c8f4..9f26e933a8a3 100644
>>>>>> --- a/arch/loongarch/mm/init.c
>>>>>> +++ b/arch/loongarch/mm/init.c
>>>>>> @@ -198,9 +198,11 @@ pte_t * __init populate_kernel_pte(unsigned long addr)
>>>>>>            if (!pmd_present(pmdp_get(pmd))) {
>>>>>>                    pte_t *pte;
>>>>>>
>>>>>> -               pte = memblock_alloc(PAGE_SIZE, PAGE_SIZE);
>>>>>> +               pte = memblock_alloc_raw(PAGE_SIZE, PAGE_SIZE);
>>>>>>                    if (!pte)
>>>>>>                            panic("%s: Failed to allocate memory\n", __func__);
>>>>>> +
>>>>>> +               kernel_pte_init(pte);
>>>>>>                    pmd_populate_kernel(&init_mm, pmd, pte);
>>>>>>            }
>>>>>>
>>>>>> diff --git a/arch/loongarch/mm/kasan_init.c b/arch/loongarch/mm/kasan_init.c
>>>>>> index 427d6b1aec09..34988573b0d5 100644
>>>>>> --- a/arch/loongarch/mm/kasan_init.c
>>>>>> +++ b/arch/loongarch/mm/kasan_init.c
>>>>>> @@ -152,6 +152,8 @@ static void __init kasan_pte_populate(pmd_t *pmdp, unsigned long addr,
>>>>>>                    phys_addr_t page_phys = early ?
>>>>>>                                            __pa_symbol(kasan_early_shadow_page)
>>>>>>                                                  : kasan_alloc_zeroed_page(node);
>>>>>> +               if (!early)
>>>>>> +                       kernel_pte_init(__va(page_phys));
>>>>>>                    next = addr + PAGE_SIZE;
>>>>>>                    set_pte(ptep, pfn_pte(__phys_to_pfn(page_phys), PAGE_KERNEL));
>>>>>>            } while (ptep++, addr = next, addr != end && __pte_none(early, ptep_get(ptep)));
>>>>>> @@ -287,7 +289,7 @@ void __init kasan_init(void)
>>>>>>                    set_pte(&kasan_early_shadow_pte[i],
>>>>>>                            pfn_pte(__phys_to_pfn(__pa_symbol(kasan_early_shadow_page)), PAGE_KERNEL_RO));
>>>>>>
>>>>>> -       memset(kasan_early_shadow_page, 0, PAGE_SIZE);
>>>>>> +       kernel_pte_init(kasan_early_shadow_page);
>>>>>>            csr_write64(__pa_symbol(swapper_pg_dir), LOONGARCH_CSR_PGDH);
>>>>>>            local_flush_tlb_all();
>>>>>>
>>>>>> diff --git a/arch/loongarch/mm/pgtable.c b/arch/loongarch/mm/pgtable.c
>>>>>> index eb6a29b491a7..228ffc1db0a3 100644
>>>>>> --- a/arch/loongarch/mm/pgtable.c
>>>>>> +++ b/arch/loongarch/mm/pgtable.c
>>>>>> @@ -38,6 +38,28 @@ pgd_t *pgd_alloc(struct mm_struct *mm)
>>>>>>     }
>>>>>>     EXPORT_SYMBOL_GPL(pgd_alloc);
>>>>>>
>>>>>> +void kernel_pte_init(void *addr)
>>>>>> +{
>>>>>> +       unsigned long *p, *end;
>>>>>> +       unsigned long entry;
>>>>>> +
>>>>>> +       entry = (unsigned long)_PAGE_GLOBAL;
>>>>>> +       p = (unsigned long *)addr;
>>>>>> +       end = p + PTRS_PER_PTE;
>>>>>> +
>>>>>> +       do {
>>>>>> +               p[0] = entry;
>>>>>> +               p[1] = entry;
>>>>>> +               p[2] = entry;
>>>>>> +               p[3] = entry;
>>>>>> +               p[4] = entry;
>>>>>> +               p += 8;
>>>>>> +               p[-3] = entry;
>>>>>> +               p[-2] = entry;
>>>>>> +               p[-1] = entry;
>>>>>> +       } while (p != end);
>>>>>> +}
>>>>>> +
>>>>>>     void pgd_init(void *addr)
>>>>>>     {
>>>>>>            unsigned long *p, *end;
>>>>>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>>>>>> index ecf63d2b0582..6909fe059a2c 100644
>>>>>> --- a/include/linux/mm.h
>>>>>> +++ b/include/linux/mm.h
>>>>>> @@ -3818,6 +3818,7 @@ void *sparse_buffer_alloc(unsigned long size);
>>>>>>     struct page * __populate_section_memmap(unsigned long pfn,
>>>>>>                    unsigned long nr_pages, int nid, struct vmem_altmap *altmap,
>>>>>>                    struct dev_pagemap *pgmap);
>>>>>> +void kernel_pte_init(void *addr);
>>>>>>     void pmd_init(void *addr);
>>>>>>     void pud_init(void *addr);
>>>>>>     pgd_t *vmemmap_pgd_populate(unsigned long addr, int node);
>>>>>> diff --git a/mm/kasan/init.c b/mm/kasan/init.c
>>>>>> index 89895f38f722..ac607c306292 100644
>>>>>> --- a/mm/kasan/init.c
>>>>>> +++ b/mm/kasan/init.c
>>>>>> @@ -106,6 +106,10 @@ static void __ref zero_pte_populate(pmd_t *pmd, unsigned long addr,
>>>>>>            }
>>>>>>     }
>>>>>>
>>>>>> +void __weak __meminit kernel_pte_init(void *addr)
>>>>>> +{
>>>>>> +}
>>>>>> +
>>>>>>     static int __ref zero_pmd_populate(pud_t *pud, unsigned long addr,
>>>>>>                                    unsigned long end)
>>>>>>     {
>>>>>> @@ -126,8 +130,10 @@ static int __ref zero_pmd_populate(pud_t *pud, unsigned long addr,
>>>>>>
>>>>>>                            if (slab_is_available())
>>>>>>                                    p = pte_alloc_one_kernel(&init_mm);
>>>>>> -                       else
>>>>>> +                       else {
>>>>>>                                    p = early_alloc(PAGE_SIZE, NUMA_NO_NODE);
>>>>>> +                               kernel_pte_init(p);
>>>>>> +                       }
>>>>>>                            if (!p)
>>>>>>                                    return -ENOMEM;
>>>>>>
>>>>>> diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
>>>>>> index edcc7a6b0f6f..c0388b2e959d 100644
>>>>>> --- a/mm/sparse-vmemmap.c
>>>>>> +++ b/mm/sparse-vmemmap.c
>>>>>> @@ -184,6 +184,10 @@ static void * __meminit vmemmap_alloc_block_zero(unsigned long size, int node)
>>>>>>            return p;
>>>>>>     }
>>>>>>
>>>>>> +void __weak __meminit kernel_pte_init(void *addr)
>>>>>> +{
>>>>>> +}
>>>>>> +
>>>>>>     pmd_t * __meminit vmemmap_pmd_populate(pud_t *pud, unsigned long addr, int node)
>>>>>>     {
>>>>>>            pmd_t *pmd = pmd_offset(pud, addr);
>>>>>> @@ -191,6 +195,7 @@ pmd_t * __meminit vmemmap_pmd_populate(pud_t *pud, unsigned long addr, int node)
>>>>>>                    void *p = vmemmap_alloc_block_zero(PAGE_SIZE, node);
>>>>>>                    if (!p)
>>>>>>                            return NULL;
>>>>>> +               kernel_pte_init(p);
>>>>>>                    pmd_populate_kernel(&init_mm, pmd, p);
>>>>>>            }
>>>>>>            return pmd;
>>>>>> --
>>>>>> 2.39.3
>>>>>>
>>>>
>>>>
>>
>>



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/3] LoongArch: Set initial pte entry with PAGE_GLOBAL for kernel space
  2024-10-18  6:23             ` maobibo
@ 2024-10-18  6:32               ` Huacai Chen
  2024-10-21  1:22                 ` maobibo
  0 siblings, 1 reply; 21+ messages in thread
From: Huacai Chen @ 2024-10-18  6:32 UTC (permalink / raw)
  To: maobibo
  Cc: wuruiyang, Andrey Ryabinin, Andrew Morton, David Hildenbrand,
	Barry Song, loongarch, linux-kernel, kasan-dev, linux-mm

On Fri, Oct 18, 2024 at 2:23 PM maobibo <maobibo@loongson.cn> wrote:
>
>
>
> On 2024/10/18 下午12:23, Huacai Chen wrote:
> > On Fri, Oct 18, 2024 at 12:16 PM maobibo <maobibo@loongson.cn> wrote:
> >>
> >>
> >>
> >> On 2024/10/18 下午12:11, Huacai Chen wrote:
> >>> On Fri, Oct 18, 2024 at 11:44 AM maobibo <maobibo@loongson.cn> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 2024/10/18 上午11:14, Huacai Chen wrote:
> >>>>> Hi, Bibo,
> >>>>>
> >>>>> I applied this patch but drop the part of arch/loongarch/mm/kasan_init.c:
> >>>>> https://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson.git/commit/?h=loongarch-next&id=15832255e84494853f543b4c70ced50afc403067
> >>>>>
> >>>>> Because kernel_pte_init() should operate on page-table pages, not on
> >>>>> data pages. You have already handle page-table page in
> >>>>> mm/kasan/init.c, and if we don't drop the modification on data pages
> >>>>> in arch/loongarch/mm/kasan_init.c, the kernel fail to boot if KASAN is
> >>>>> enabled.
> >>>>>
> >>>> static inline void set_pte(pte_t *ptep, pte_t pteval)
> >>>>     {
> >>>>           WRITE_ONCE(*ptep, pteval);
> >>>> -
> >>>> -       if (pte_val(pteval) & _PAGE_GLOBAL) {
> >>>> -               pte_t *buddy = ptep_buddy(ptep);
> >>>> -               /*
> >>>> -                * Make sure the buddy is global too (if it's !none,
> >>>> -                * it better already be global)
> >>>> -                */
> >>>> -               if (pte_none(ptep_get(buddy))) {
> >>>> -#ifdef CONFIG_SMP
> >>>> -                       /*
> >>>> -                        * For SMP, multiple CPUs can race, so we need
> >>>> -                        * to do this atomically.
> >>>> -                        */
> >>>> -                       __asm__ __volatile__(
> >>>> -                       __AMOR "$zero, %[global], %[buddy] \n"
> >>>> -                       : [buddy] "+ZB" (buddy->pte)
> >>>> -                       : [global] "r" (_PAGE_GLOBAL)
> >>>> -                       : "memory");
> >>>> -
> >>>> -                       DBAR(0b11000); /* o_wrw = 0b11000 */
> >>>> -#else /* !CONFIG_SMP */
> >>>> -                       WRITE_ONCE(*buddy, __pte(pte_val(ptep_get(buddy)) | _PAGE_GLOBAL));
> >>>> -#endif /* CONFIG_SMP */
> >>>> -               }
> >>>> -       }
> >>>> +       DBAR(0b11000); /* o_wrw = 0b11000 */
> >>>>     }
> >>>>
> >>>> No, please hold on. This issue exists about twenty years, Do we need be
> >>>> in such a hurry now?
> >>>>
> >>>> why is DBAR(0b11000) added in set_pte()?
> >>> It exists before, not added by this patch. The reason is explained in
> >>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.12-rc3&id=f93f67d06b1023313ef1662eac490e29c025c030
> >> why speculative accesses may cause spurious page fault in kernel space
> >> with PTE enabled?  speculative accesses exists anywhere, it does not
> >> cause spurious page fault.
> > Confirmed by Ruiyang Wu, and even if DBAR(0b11000) is wrong, that
> > means another patch's mistake, not this one. This one just keeps the
> > old behavior.
> > +CC Ruiyang Wu here.
> Also from Ruiyang Wu, the information is that speculative accesses may
> insert stale TLB, however no page fault exception.
>
> So adding barrier in set_pte() does not prevent speculative accesses.
> And you write patch here, however do not know the actual reason?
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.12-rc3&id=f93f67d06b1023313ef1662eac490e29c025c030
I have CCed Ruiyang, whether the description is correct can be judged by him.

Huacai

>
> Bibo Mao
> >
> > Huacai
> >
> >>
> >> Obvious you do not it and you write wrong patch.
> >>
> >>>
> >>> Huacai
> >>>
> >>>>
> >>>> Regards
> >>>> Bibo Mao
> >>>>> Huacai
> >>>>>
> >>>>> On Mon, Oct 14, 2024 at 11:59 AM Bibo Mao <maobibo@loongson.cn> wrote:
> >>>>>>
> >>>>>> Unlike general architectures, there are two pages in one TLB entry
> >>>>>> on LoongArch system. For kernel space, it requires both two pte
> >>>>>> entries with PAGE_GLOBAL bit set, else HW treats it as non-global
> >>>>>> tlb, there will be potential problems if tlb entry for kernel space
> >>>>>> is not global. Such as fail to flush kernel tlb with function
> >>>>>> local_flush_tlb_kernel_range() which only flush tlb with global bit.
> >>>>>>
> >>>>>> With function kernel_pte_init() added, it can be used to init pte
> >>>>>> table when it is created for kernel address space, and the default
> >>>>>> initial pte value is PAGE_GLOBAL rather than zero at beginning.
> >>>>>>
> >>>>>> Kernel address space areas includes fixmap, percpu, vmalloc, kasan
> >>>>>> and vmemmap areas set default pte entry with PAGE_GLOBAL set.
> >>>>>>
> >>>>>> Signed-off-by: Bibo Mao <maobibo@loongson.cn>
> >>>>>> ---
> >>>>>>     arch/loongarch/include/asm/pgalloc.h | 13 +++++++++++++
> >>>>>>     arch/loongarch/include/asm/pgtable.h |  1 +
> >>>>>>     arch/loongarch/mm/init.c             |  4 +++-
> >>>>>>     arch/loongarch/mm/kasan_init.c       |  4 +++-
> >>>>>>     arch/loongarch/mm/pgtable.c          | 22 ++++++++++++++++++++++
> >>>>>>     include/linux/mm.h                   |  1 +
> >>>>>>     mm/kasan/init.c                      |  8 +++++++-
> >>>>>>     mm/sparse-vmemmap.c                  |  5 +++++
> >>>>>>     8 files changed, 55 insertions(+), 3 deletions(-)
> >>>>>>
> >>>>>> diff --git a/arch/loongarch/include/asm/pgalloc.h b/arch/loongarch/include/asm/pgalloc.h
> >>>>>> index 4e2d6b7ca2ee..b2698c03dc2c 100644
> >>>>>> --- a/arch/loongarch/include/asm/pgalloc.h
> >>>>>> +++ b/arch/loongarch/include/asm/pgalloc.h
> >>>>>> @@ -10,8 +10,21 @@
> >>>>>>
> >>>>>>     #define __HAVE_ARCH_PMD_ALLOC_ONE
> >>>>>>     #define __HAVE_ARCH_PUD_ALLOC_ONE
> >>>>>> +#define __HAVE_ARCH_PTE_ALLOC_ONE_KERNEL
> >>>>>>     #include <asm-generic/pgalloc.h>
> >>>>>>
> >>>>>> +static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm)
> >>>>>> +{
> >>>>>> +       pte_t *pte;
> >>>>>> +
> >>>>>> +       pte = (pte_t *) __get_free_page(GFP_KERNEL);
> >>>>>> +       if (!pte)
> >>>>>> +               return NULL;
> >>>>>> +
> >>>>>> +       kernel_pte_init(pte);
> >>>>>> +       return pte;
> >>>>>> +}
> >>>>>> +
> >>>>>>     static inline void pmd_populate_kernel(struct mm_struct *mm,
> >>>>>>                                           pmd_t *pmd, pte_t *pte)
> >>>>>>     {
> >>>>>> diff --git a/arch/loongarch/include/asm/pgtable.h b/arch/loongarch/include/asm/pgtable.h
> >>>>>> index 9965f52ef65b..22e3a8f96213 100644
> >>>>>> --- a/arch/loongarch/include/asm/pgtable.h
> >>>>>> +++ b/arch/loongarch/include/asm/pgtable.h
> >>>>>> @@ -269,6 +269,7 @@ extern void set_pmd_at(struct mm_struct *mm, unsigned long addr, pmd_t *pmdp, pm
> >>>>>>     extern void pgd_init(void *addr);
> >>>>>>     extern void pud_init(void *addr);
> >>>>>>     extern void pmd_init(void *addr);
> >>>>>> +extern void kernel_pte_init(void *addr);
> >>>>>>
> >>>>>>     /*
> >>>>>>      * Encode/decode swap entries and swap PTEs. Swap PTEs are all PTEs that
> >>>>>> diff --git a/arch/loongarch/mm/init.c b/arch/loongarch/mm/init.c
> >>>>>> index 8a87a482c8f4..9f26e933a8a3 100644
> >>>>>> --- a/arch/loongarch/mm/init.c
> >>>>>> +++ b/arch/loongarch/mm/init.c
> >>>>>> @@ -198,9 +198,11 @@ pte_t * __init populate_kernel_pte(unsigned long addr)
> >>>>>>            if (!pmd_present(pmdp_get(pmd))) {
> >>>>>>                    pte_t *pte;
> >>>>>>
> >>>>>> -               pte = memblock_alloc(PAGE_SIZE, PAGE_SIZE);
> >>>>>> +               pte = memblock_alloc_raw(PAGE_SIZE, PAGE_SIZE);
> >>>>>>                    if (!pte)
> >>>>>>                            panic("%s: Failed to allocate memory\n", __func__);
> >>>>>> +
> >>>>>> +               kernel_pte_init(pte);
> >>>>>>                    pmd_populate_kernel(&init_mm, pmd, pte);
> >>>>>>            }
> >>>>>>
> >>>>>> diff --git a/arch/loongarch/mm/kasan_init.c b/arch/loongarch/mm/kasan_init.c
> >>>>>> index 427d6b1aec09..34988573b0d5 100644
> >>>>>> --- a/arch/loongarch/mm/kasan_init.c
> >>>>>> +++ b/arch/loongarch/mm/kasan_init.c
> >>>>>> @@ -152,6 +152,8 @@ static void __init kasan_pte_populate(pmd_t *pmdp, unsigned long addr,
> >>>>>>                    phys_addr_t page_phys = early ?
> >>>>>>                                            __pa_symbol(kasan_early_shadow_page)
> >>>>>>                                                  : kasan_alloc_zeroed_page(node);
> >>>>>> +               if (!early)
> >>>>>> +                       kernel_pte_init(__va(page_phys));
> >>>>>>                    next = addr + PAGE_SIZE;
> >>>>>>                    set_pte(ptep, pfn_pte(__phys_to_pfn(page_phys), PAGE_KERNEL));
> >>>>>>            } while (ptep++, addr = next, addr != end && __pte_none(early, ptep_get(ptep)));
> >>>>>> @@ -287,7 +289,7 @@ void __init kasan_init(void)
> >>>>>>                    set_pte(&kasan_early_shadow_pte[i],
> >>>>>>                            pfn_pte(__phys_to_pfn(__pa_symbol(kasan_early_shadow_page)), PAGE_KERNEL_RO));
> >>>>>>
> >>>>>> -       memset(kasan_early_shadow_page, 0, PAGE_SIZE);
> >>>>>> +       kernel_pte_init(kasan_early_shadow_page);
> >>>>>>            csr_write64(__pa_symbol(swapper_pg_dir), LOONGARCH_CSR_PGDH);
> >>>>>>            local_flush_tlb_all();
> >>>>>>
> >>>>>> diff --git a/arch/loongarch/mm/pgtable.c b/arch/loongarch/mm/pgtable.c
> >>>>>> index eb6a29b491a7..228ffc1db0a3 100644
> >>>>>> --- a/arch/loongarch/mm/pgtable.c
> >>>>>> +++ b/arch/loongarch/mm/pgtable.c
> >>>>>> @@ -38,6 +38,28 @@ pgd_t *pgd_alloc(struct mm_struct *mm)
> >>>>>>     }
> >>>>>>     EXPORT_SYMBOL_GPL(pgd_alloc);
> >>>>>>
> >>>>>> +void kernel_pte_init(void *addr)
> >>>>>> +{
> >>>>>> +       unsigned long *p, *end;
> >>>>>> +       unsigned long entry;
> >>>>>> +
> >>>>>> +       entry = (unsigned long)_PAGE_GLOBAL;
> >>>>>> +       p = (unsigned long *)addr;
> >>>>>> +       end = p + PTRS_PER_PTE;
> >>>>>> +
> >>>>>> +       do {
> >>>>>> +               p[0] = entry;
> >>>>>> +               p[1] = entry;
> >>>>>> +               p[2] = entry;
> >>>>>> +               p[3] = entry;
> >>>>>> +               p[4] = entry;
> >>>>>> +               p += 8;
> >>>>>> +               p[-3] = entry;
> >>>>>> +               p[-2] = entry;
> >>>>>> +               p[-1] = entry;
> >>>>>> +       } while (p != end);
> >>>>>> +}
> >>>>>> +
> >>>>>>     void pgd_init(void *addr)
> >>>>>>     {
> >>>>>>            unsigned long *p, *end;
> >>>>>> diff --git a/include/linux/mm.h b/include/linux/mm.h
> >>>>>> index ecf63d2b0582..6909fe059a2c 100644
> >>>>>> --- a/include/linux/mm.h
> >>>>>> +++ b/include/linux/mm.h
> >>>>>> @@ -3818,6 +3818,7 @@ void *sparse_buffer_alloc(unsigned long size);
> >>>>>>     struct page * __populate_section_memmap(unsigned long pfn,
> >>>>>>                    unsigned long nr_pages, int nid, struct vmem_altmap *altmap,
> >>>>>>                    struct dev_pagemap *pgmap);
> >>>>>> +void kernel_pte_init(void *addr);
> >>>>>>     void pmd_init(void *addr);
> >>>>>>     void pud_init(void *addr);
> >>>>>>     pgd_t *vmemmap_pgd_populate(unsigned long addr, int node);
> >>>>>> diff --git a/mm/kasan/init.c b/mm/kasan/init.c
> >>>>>> index 89895f38f722..ac607c306292 100644
> >>>>>> --- a/mm/kasan/init.c
> >>>>>> +++ b/mm/kasan/init.c
> >>>>>> @@ -106,6 +106,10 @@ static void __ref zero_pte_populate(pmd_t *pmd, unsigned long addr,
> >>>>>>            }
> >>>>>>     }
> >>>>>>
> >>>>>> +void __weak __meminit kernel_pte_init(void *addr)
> >>>>>> +{
> >>>>>> +}
> >>>>>> +
> >>>>>>     static int __ref zero_pmd_populate(pud_t *pud, unsigned long addr,
> >>>>>>                                    unsigned long end)
> >>>>>>     {
> >>>>>> @@ -126,8 +130,10 @@ static int __ref zero_pmd_populate(pud_t *pud, unsigned long addr,
> >>>>>>
> >>>>>>                            if (slab_is_available())
> >>>>>>                                    p = pte_alloc_one_kernel(&init_mm);
> >>>>>> -                       else
> >>>>>> +                       else {
> >>>>>>                                    p = early_alloc(PAGE_SIZE, NUMA_NO_NODE);
> >>>>>> +                               kernel_pte_init(p);
> >>>>>> +                       }
> >>>>>>                            if (!p)
> >>>>>>                                    return -ENOMEM;
> >>>>>>
> >>>>>> diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
> >>>>>> index edcc7a6b0f6f..c0388b2e959d 100644
> >>>>>> --- a/mm/sparse-vmemmap.c
> >>>>>> +++ b/mm/sparse-vmemmap.c
> >>>>>> @@ -184,6 +184,10 @@ static void * __meminit vmemmap_alloc_block_zero(unsigned long size, int node)
> >>>>>>            return p;
> >>>>>>     }
> >>>>>>
> >>>>>> +void __weak __meminit kernel_pte_init(void *addr)
> >>>>>> +{
> >>>>>> +}
> >>>>>> +
> >>>>>>     pmd_t * __meminit vmemmap_pmd_populate(pud_t *pud, unsigned long addr, int node)
> >>>>>>     {
> >>>>>>            pmd_t *pmd = pmd_offset(pud, addr);
> >>>>>> @@ -191,6 +195,7 @@ pmd_t * __meminit vmemmap_pmd_populate(pud_t *pud, unsigned long addr, int node)
> >>>>>>                    void *p = vmemmap_alloc_block_zero(PAGE_SIZE, node);
> >>>>>>                    if (!p)
> >>>>>>                            return NULL;
> >>>>>> +               kernel_pte_init(p);
> >>>>>>                    pmd_populate_kernel(&init_mm, pmd, p);
> >>>>>>            }
> >>>>>>            return pmd;
> >>>>>> --
> >>>>>> 2.39.3
> >>>>>>
> >>>>
> >>>>
> >>
> >>
>
>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/3] LoongArch: Set initial pte entry with PAGE_GLOBAL for kernel space
  2024-10-18  6:32               ` Huacai Chen
@ 2024-10-21  1:22                 ` maobibo
  2024-10-21 10:13                   ` Huacai Chen
  0 siblings, 1 reply; 21+ messages in thread
From: maobibo @ 2024-10-21  1:22 UTC (permalink / raw)
  To: Huacai Chen
  Cc: wuruiyang, Andrey Ryabinin, Andrew Morton, David Hildenbrand,
	Barry Song, loongarch, linux-kernel, kasan-dev, linux-mm



On 2024/10/18 下午2:32, Huacai Chen wrote:
> On Fri, Oct 18, 2024 at 2:23 PM maobibo <maobibo@loongson.cn> wrote:
>>
>>
>>
>> On 2024/10/18 下午12:23, Huacai Chen wrote:
>>> On Fri, Oct 18, 2024 at 12:16 PM maobibo <maobibo@loongson.cn> wrote:
>>>>
>>>>
>>>>
>>>> On 2024/10/18 下午12:11, Huacai Chen wrote:
>>>>> On Fri, Oct 18, 2024 at 11:44 AM maobibo <maobibo@loongson.cn> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 2024/10/18 上午11:14, Huacai Chen wrote:
>>>>>>> Hi, Bibo,
>>>>>>>
>>>>>>> I applied this patch but drop the part of arch/loongarch/mm/kasan_init.c:
>>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson.git/commit/?h=loongarch-next&id=15832255e84494853f543b4c70ced50afc403067
>>>>>>>
>>>>>>> Because kernel_pte_init() should operate on page-table pages, not on
>>>>>>> data pages. You have already handle page-table page in
>>>>>>> mm/kasan/init.c, and if we don't drop the modification on data pages
>>>>>>> in arch/loongarch/mm/kasan_init.c, the kernel fail to boot if KASAN is
>>>>>>> enabled.
>>>>>>>
>>>>>> static inline void set_pte(pte_t *ptep, pte_t pteval)
>>>>>>      {
>>>>>>            WRITE_ONCE(*ptep, pteval);
>>>>>> -
>>>>>> -       if (pte_val(pteval) & _PAGE_GLOBAL) {
>>>>>> -               pte_t *buddy = ptep_buddy(ptep);
>>>>>> -               /*
>>>>>> -                * Make sure the buddy is global too (if it's !none,
>>>>>> -                * it better already be global)
>>>>>> -                */
>>>>>> -               if (pte_none(ptep_get(buddy))) {
>>>>>> -#ifdef CONFIG_SMP
>>>>>> -                       /*
>>>>>> -                        * For SMP, multiple CPUs can race, so we need
>>>>>> -                        * to do this atomically.
>>>>>> -                        */
>>>>>> -                       __asm__ __volatile__(
>>>>>> -                       __AMOR "$zero, %[global], %[buddy] \n"
>>>>>> -                       : [buddy] "+ZB" (buddy->pte)
>>>>>> -                       : [global] "r" (_PAGE_GLOBAL)
>>>>>> -                       : "memory");
>>>>>> -
>>>>>> -                       DBAR(0b11000); /* o_wrw = 0b11000 */
>>>>>> -#else /* !CONFIG_SMP */
>>>>>> -                       WRITE_ONCE(*buddy, __pte(pte_val(ptep_get(buddy)) | _PAGE_GLOBAL));
>>>>>> -#endif /* CONFIG_SMP */
>>>>>> -               }
>>>>>> -       }
>>>>>> +       DBAR(0b11000); /* o_wrw = 0b11000 */
>>>>>>      }
>>>>>>
>>>>>> No, please hold on. This issue exists about twenty years, Do we need be
>>>>>> in such a hurry now?
>>>>>>
>>>>>> why is DBAR(0b11000) added in set_pte()?
>>>>> It exists before, not added by this patch. The reason is explained in
>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.12-rc3&id=f93f67d06b1023313ef1662eac490e29c025c030
>>>> why speculative accesses may cause spurious page fault in kernel space
>>>> with PTE enabled?  speculative accesses exists anywhere, it does not
>>>> cause spurious page fault.
>>> Confirmed by Ruiyang Wu, and even if DBAR(0b11000) is wrong, that
>>> means another patch's mistake, not this one. This one just keeps the
>>> old behavior.
>>> +CC Ruiyang Wu here.
>> Also from Ruiyang Wu, the information is that speculative accesses may
>> insert stale TLB, however no page fault exception.
>>
>> So adding barrier in set_pte() does not prevent speculative accesses.
>> And you write patch here, however do not know the actual reason?
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.12-rc3&id=f93f67d06b1023313ef1662eac490e29c025c030
> I have CCed Ruiyang, whether the description is correct can be judged by him.

There are some problems to add barrier() in set_pte():

1. There is such issue only for HW ptw enabled and kernel address space, 
is that? Also it may be two heavy to add barrier in set_pte(), comparing 
to do this in flush_cache_vmap().

2. LoongArch is different with other other architectures, two pages are 
included in one TLB entry. If there is two consecutive page mapped and 
memory access, there will page fault for the second memory access. Such
as:
    addr1 =percpu_alloc(pagesize);
    val1 = *(int *)addr1;
      // With page table walk, addr1 is present and addr2 is pte_none
      // TLB entry includes valid pte for addr1, invalid pte for addr2
    addr2 =percpu_alloc(pagesize); // will not flush tlb in first time
    val2 = *(int *)addr2;
      // With page table walk, addr1 is present and addr2 is present also
      // TLB entry includes valid pte for addr1, invalid pte for addr2
    So there will be page fault when accessing address addr2

There there is the same problem with user address space. By the way, 
there is HW prefetching technology, negative effective of HW prefetching 
technology will be tlb added. So there is potential page fault if memory 
is allocated and accessed in the first time.

3. For speculative execution, if it is user address, there is eret from 
syscall. eret will rollback all speculative execution instruction. So it 
is only problem for speculative execution. And how to verify whether it 
is the problem of speculative execution or it is the problem of clause 2?

Regards
Bibo Mao


> 
> Huacai
> 
>>
>> Bibo Mao
>>>
>>> Huacai
>>>
>>>>
>>>> Obvious you do not it and you write wrong patch.
>>>>
>>>>>
>>>>> Huacai
>>>>>
>>>>>>
>>>>>> Regards
>>>>>> Bibo Mao
>>>>>>> Huacai
>>>>>>>
>>>>>>> On Mon, Oct 14, 2024 at 11:59 AM Bibo Mao <maobibo@loongson.cn> wrote:
>>>>>>>>
>>>>>>>> Unlike general architectures, there are two pages in one TLB entry
>>>>>>>> on LoongArch system. For kernel space, it requires both two pte
>>>>>>>> entries with PAGE_GLOBAL bit set, else HW treats it as non-global
>>>>>>>> tlb, there will be potential problems if tlb entry for kernel space
>>>>>>>> is not global. Such as fail to flush kernel tlb with function
>>>>>>>> local_flush_tlb_kernel_range() which only flush tlb with global bit.
>>>>>>>>
>>>>>>>> With function kernel_pte_init() added, it can be used to init pte
>>>>>>>> table when it is created for kernel address space, and the default
>>>>>>>> initial pte value is PAGE_GLOBAL rather than zero at beginning.
>>>>>>>>
>>>>>>>> Kernel address space areas includes fixmap, percpu, vmalloc, kasan
>>>>>>>> and vmemmap areas set default pte entry with PAGE_GLOBAL set.
>>>>>>>>
>>>>>>>> Signed-off-by: Bibo Mao <maobibo@loongson.cn>
>>>>>>>> ---
>>>>>>>>      arch/loongarch/include/asm/pgalloc.h | 13 +++++++++++++
>>>>>>>>      arch/loongarch/include/asm/pgtable.h |  1 +
>>>>>>>>      arch/loongarch/mm/init.c             |  4 +++-
>>>>>>>>      arch/loongarch/mm/kasan_init.c       |  4 +++-
>>>>>>>>      arch/loongarch/mm/pgtable.c          | 22 ++++++++++++++++++++++
>>>>>>>>      include/linux/mm.h                   |  1 +
>>>>>>>>      mm/kasan/init.c                      |  8 +++++++-
>>>>>>>>      mm/sparse-vmemmap.c                  |  5 +++++
>>>>>>>>      8 files changed, 55 insertions(+), 3 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/arch/loongarch/include/asm/pgalloc.h b/arch/loongarch/include/asm/pgalloc.h
>>>>>>>> index 4e2d6b7ca2ee..b2698c03dc2c 100644
>>>>>>>> --- a/arch/loongarch/include/asm/pgalloc.h
>>>>>>>> +++ b/arch/loongarch/include/asm/pgalloc.h
>>>>>>>> @@ -10,8 +10,21 @@
>>>>>>>>
>>>>>>>>      #define __HAVE_ARCH_PMD_ALLOC_ONE
>>>>>>>>      #define __HAVE_ARCH_PUD_ALLOC_ONE
>>>>>>>> +#define __HAVE_ARCH_PTE_ALLOC_ONE_KERNEL
>>>>>>>>      #include <asm-generic/pgalloc.h>
>>>>>>>>
>>>>>>>> +static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm)
>>>>>>>> +{
>>>>>>>> +       pte_t *pte;
>>>>>>>> +
>>>>>>>> +       pte = (pte_t *) __get_free_page(GFP_KERNEL);
>>>>>>>> +       if (!pte)
>>>>>>>> +               return NULL;
>>>>>>>> +
>>>>>>>> +       kernel_pte_init(pte);
>>>>>>>> +       return pte;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>>      static inline void pmd_populate_kernel(struct mm_struct *mm,
>>>>>>>>                                            pmd_t *pmd, pte_t *pte)
>>>>>>>>      {
>>>>>>>> diff --git a/arch/loongarch/include/asm/pgtable.h b/arch/loongarch/include/asm/pgtable.h
>>>>>>>> index 9965f52ef65b..22e3a8f96213 100644
>>>>>>>> --- a/arch/loongarch/include/asm/pgtable.h
>>>>>>>> +++ b/arch/loongarch/include/asm/pgtable.h
>>>>>>>> @@ -269,6 +269,7 @@ extern void set_pmd_at(struct mm_struct *mm, unsigned long addr, pmd_t *pmdp, pm
>>>>>>>>      extern void pgd_init(void *addr);
>>>>>>>>      extern void pud_init(void *addr);
>>>>>>>>      extern void pmd_init(void *addr);
>>>>>>>> +extern void kernel_pte_init(void *addr);
>>>>>>>>
>>>>>>>>      /*
>>>>>>>>       * Encode/decode swap entries and swap PTEs. Swap PTEs are all PTEs that
>>>>>>>> diff --git a/arch/loongarch/mm/init.c b/arch/loongarch/mm/init.c
>>>>>>>> index 8a87a482c8f4..9f26e933a8a3 100644
>>>>>>>> --- a/arch/loongarch/mm/init.c
>>>>>>>> +++ b/arch/loongarch/mm/init.c
>>>>>>>> @@ -198,9 +198,11 @@ pte_t * __init populate_kernel_pte(unsigned long addr)
>>>>>>>>             if (!pmd_present(pmdp_get(pmd))) {
>>>>>>>>                     pte_t *pte;
>>>>>>>>
>>>>>>>> -               pte = memblock_alloc(PAGE_SIZE, PAGE_SIZE);
>>>>>>>> +               pte = memblock_alloc_raw(PAGE_SIZE, PAGE_SIZE);
>>>>>>>>                     if (!pte)
>>>>>>>>                             panic("%s: Failed to allocate memory\n", __func__);
>>>>>>>> +
>>>>>>>> +               kernel_pte_init(pte);
>>>>>>>>                     pmd_populate_kernel(&init_mm, pmd, pte);
>>>>>>>>             }
>>>>>>>>
>>>>>>>> diff --git a/arch/loongarch/mm/kasan_init.c b/arch/loongarch/mm/kasan_init.c
>>>>>>>> index 427d6b1aec09..34988573b0d5 100644
>>>>>>>> --- a/arch/loongarch/mm/kasan_init.c
>>>>>>>> +++ b/arch/loongarch/mm/kasan_init.c
>>>>>>>> @@ -152,6 +152,8 @@ static void __init kasan_pte_populate(pmd_t *pmdp, unsigned long addr,
>>>>>>>>                     phys_addr_t page_phys = early ?
>>>>>>>>                                             __pa_symbol(kasan_early_shadow_page)
>>>>>>>>                                                   : kasan_alloc_zeroed_page(node);
>>>>>>>> +               if (!early)
>>>>>>>> +                       kernel_pte_init(__va(page_phys));
>>>>>>>>                     next = addr + PAGE_SIZE;
>>>>>>>>                     set_pte(ptep, pfn_pte(__phys_to_pfn(page_phys), PAGE_KERNEL));
>>>>>>>>             } while (ptep++, addr = next, addr != end && __pte_none(early, ptep_get(ptep)));
>>>>>>>> @@ -287,7 +289,7 @@ void __init kasan_init(void)
>>>>>>>>                     set_pte(&kasan_early_shadow_pte[i],
>>>>>>>>                             pfn_pte(__phys_to_pfn(__pa_symbol(kasan_early_shadow_page)), PAGE_KERNEL_RO));
>>>>>>>>
>>>>>>>> -       memset(kasan_early_shadow_page, 0, PAGE_SIZE);
>>>>>>>> +       kernel_pte_init(kasan_early_shadow_page);
>>>>>>>>             csr_write64(__pa_symbol(swapper_pg_dir), LOONGARCH_CSR_PGDH);
>>>>>>>>             local_flush_tlb_all();
>>>>>>>>
>>>>>>>> diff --git a/arch/loongarch/mm/pgtable.c b/arch/loongarch/mm/pgtable.c
>>>>>>>> index eb6a29b491a7..228ffc1db0a3 100644
>>>>>>>> --- a/arch/loongarch/mm/pgtable.c
>>>>>>>> +++ b/arch/loongarch/mm/pgtable.c
>>>>>>>> @@ -38,6 +38,28 @@ pgd_t *pgd_alloc(struct mm_struct *mm)
>>>>>>>>      }
>>>>>>>>      EXPORT_SYMBOL_GPL(pgd_alloc);
>>>>>>>>
>>>>>>>> +void kernel_pte_init(void *addr)
>>>>>>>> +{
>>>>>>>> +       unsigned long *p, *end;
>>>>>>>> +       unsigned long entry;
>>>>>>>> +
>>>>>>>> +       entry = (unsigned long)_PAGE_GLOBAL;
>>>>>>>> +       p = (unsigned long *)addr;
>>>>>>>> +       end = p + PTRS_PER_PTE;
>>>>>>>> +
>>>>>>>> +       do {
>>>>>>>> +               p[0] = entry;
>>>>>>>> +               p[1] = entry;
>>>>>>>> +               p[2] = entry;
>>>>>>>> +               p[3] = entry;
>>>>>>>> +               p[4] = entry;
>>>>>>>> +               p += 8;
>>>>>>>> +               p[-3] = entry;
>>>>>>>> +               p[-2] = entry;
>>>>>>>> +               p[-1] = entry;
>>>>>>>> +       } while (p != end);
>>>>>>>> +}
>>>>>>>> +
>>>>>>>>      void pgd_init(void *addr)
>>>>>>>>      {
>>>>>>>>             unsigned long *p, *end;
>>>>>>>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>>>>>>>> index ecf63d2b0582..6909fe059a2c 100644
>>>>>>>> --- a/include/linux/mm.h
>>>>>>>> +++ b/include/linux/mm.h
>>>>>>>> @@ -3818,6 +3818,7 @@ void *sparse_buffer_alloc(unsigned long size);
>>>>>>>>      struct page * __populate_section_memmap(unsigned long pfn,
>>>>>>>>                     unsigned long nr_pages, int nid, struct vmem_altmap *altmap,
>>>>>>>>                     struct dev_pagemap *pgmap);
>>>>>>>> +void kernel_pte_init(void *addr);
>>>>>>>>      void pmd_init(void *addr);
>>>>>>>>      void pud_init(void *addr);
>>>>>>>>      pgd_t *vmemmap_pgd_populate(unsigned long addr, int node);
>>>>>>>> diff --git a/mm/kasan/init.c b/mm/kasan/init.c
>>>>>>>> index 89895f38f722..ac607c306292 100644
>>>>>>>> --- a/mm/kasan/init.c
>>>>>>>> +++ b/mm/kasan/init.c
>>>>>>>> @@ -106,6 +106,10 @@ static void __ref zero_pte_populate(pmd_t *pmd, unsigned long addr,
>>>>>>>>             }
>>>>>>>>      }
>>>>>>>>
>>>>>>>> +void __weak __meminit kernel_pte_init(void *addr)
>>>>>>>> +{
>>>>>>>> +}
>>>>>>>> +
>>>>>>>>      static int __ref zero_pmd_populate(pud_t *pud, unsigned long addr,
>>>>>>>>                                     unsigned long end)
>>>>>>>>      {
>>>>>>>> @@ -126,8 +130,10 @@ static int __ref zero_pmd_populate(pud_t *pud, unsigned long addr,
>>>>>>>>
>>>>>>>>                             if (slab_is_available())
>>>>>>>>                                     p = pte_alloc_one_kernel(&init_mm);
>>>>>>>> -                       else
>>>>>>>> +                       else {
>>>>>>>>                                     p = early_alloc(PAGE_SIZE, NUMA_NO_NODE);
>>>>>>>> +                               kernel_pte_init(p);
>>>>>>>> +                       }
>>>>>>>>                             if (!p)
>>>>>>>>                                     return -ENOMEM;
>>>>>>>>
>>>>>>>> diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
>>>>>>>> index edcc7a6b0f6f..c0388b2e959d 100644
>>>>>>>> --- a/mm/sparse-vmemmap.c
>>>>>>>> +++ b/mm/sparse-vmemmap.c
>>>>>>>> @@ -184,6 +184,10 @@ static void * __meminit vmemmap_alloc_block_zero(unsigned long size, int node)
>>>>>>>>             return p;
>>>>>>>>      }
>>>>>>>>
>>>>>>>> +void __weak __meminit kernel_pte_init(void *addr)
>>>>>>>> +{
>>>>>>>> +}
>>>>>>>> +
>>>>>>>>      pmd_t * __meminit vmemmap_pmd_populate(pud_t *pud, unsigned long addr, int node)
>>>>>>>>      {
>>>>>>>>             pmd_t *pmd = pmd_offset(pud, addr);
>>>>>>>> @@ -191,6 +195,7 @@ pmd_t * __meminit vmemmap_pmd_populate(pud_t *pud, unsigned long addr, int node)
>>>>>>>>                     void *p = vmemmap_alloc_block_zero(PAGE_SIZE, node);
>>>>>>>>                     if (!p)
>>>>>>>>                             return NULL;
>>>>>>>> +               kernel_pte_init(p);
>>>>>>>>                     pmd_populate_kernel(&init_mm, pmd, p);
>>>>>>>>             }
>>>>>>>>             return pmd;
>>>>>>>> --
>>>>>>>> 2.39.3
>>>>>>>>
>>>>>>
>>>>>>
>>>>
>>>>
>>
>>



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/3] LoongArch: Set initial pte entry with PAGE_GLOBAL for kernel space
  2024-10-21  1:22                 ` maobibo
@ 2024-10-21 10:13                   ` Huacai Chen
  2024-10-22  1:39                     ` maobibo
  0 siblings, 1 reply; 21+ messages in thread
From: Huacai Chen @ 2024-10-21 10:13 UTC (permalink / raw)
  To: maobibo
  Cc: wuruiyang, Andrey Ryabinin, Andrew Morton, David Hildenbrand,
	Barry Song, loongarch, linux-kernel, kasan-dev, linux-mm

On Mon, Oct 21, 2024 at 9:23 AM maobibo <maobibo@loongson.cn> wrote:
>
>
>
> On 2024/10/18 下午2:32, Huacai Chen wrote:
> > On Fri, Oct 18, 2024 at 2:23 PM maobibo <maobibo@loongson.cn> wrote:
> >>
> >>
> >>
> >> On 2024/10/18 下午12:23, Huacai Chen wrote:
> >>> On Fri, Oct 18, 2024 at 12:16 PM maobibo <maobibo@loongson.cn> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 2024/10/18 下午12:11, Huacai Chen wrote:
> >>>>> On Fri, Oct 18, 2024 at 11:44 AM maobibo <maobibo@loongson.cn> wrote:
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On 2024/10/18 上午11:14, Huacai Chen wrote:
> >>>>>>> Hi, Bibo,
> >>>>>>>
> >>>>>>> I applied this patch but drop the part of arch/loongarch/mm/kasan_init.c:
> >>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson.git/commit/?h=loongarch-next&id=15832255e84494853f543b4c70ced50afc403067
> >>>>>>>
> >>>>>>> Because kernel_pte_init() should operate on page-table pages, not on
> >>>>>>> data pages. You have already handle page-table page in
> >>>>>>> mm/kasan/init.c, and if we don't drop the modification on data pages
> >>>>>>> in arch/loongarch/mm/kasan_init.c, the kernel fail to boot if KASAN is
> >>>>>>> enabled.
> >>>>>>>
> >>>>>> static inline void set_pte(pte_t *ptep, pte_t pteval)
> >>>>>>      {
> >>>>>>            WRITE_ONCE(*ptep, pteval);
> >>>>>> -
> >>>>>> -       if (pte_val(pteval) & _PAGE_GLOBAL) {
> >>>>>> -               pte_t *buddy = ptep_buddy(ptep);
> >>>>>> -               /*
> >>>>>> -                * Make sure the buddy is global too (if it's !none,
> >>>>>> -                * it better already be global)
> >>>>>> -                */
> >>>>>> -               if (pte_none(ptep_get(buddy))) {
> >>>>>> -#ifdef CONFIG_SMP
> >>>>>> -                       /*
> >>>>>> -                        * For SMP, multiple CPUs can race, so we need
> >>>>>> -                        * to do this atomically.
> >>>>>> -                        */
> >>>>>> -                       __asm__ __volatile__(
> >>>>>> -                       __AMOR "$zero, %[global], %[buddy] \n"
> >>>>>> -                       : [buddy] "+ZB" (buddy->pte)
> >>>>>> -                       : [global] "r" (_PAGE_GLOBAL)
> >>>>>> -                       : "memory");
> >>>>>> -
> >>>>>> -                       DBAR(0b11000); /* o_wrw = 0b11000 */
> >>>>>> -#else /* !CONFIG_SMP */
> >>>>>> -                       WRITE_ONCE(*buddy, __pte(pte_val(ptep_get(buddy)) | _PAGE_GLOBAL));
> >>>>>> -#endif /* CONFIG_SMP */
> >>>>>> -               }
> >>>>>> -       }
> >>>>>> +       DBAR(0b11000); /* o_wrw = 0b11000 */
> >>>>>>      }
> >>>>>>
> >>>>>> No, please hold on. This issue exists about twenty years, Do we need be
> >>>>>> in such a hurry now?
> >>>>>>
> >>>>>> why is DBAR(0b11000) added in set_pte()?
> >>>>> It exists before, not added by this patch. The reason is explained in
> >>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.12-rc3&id=f93f67d06b1023313ef1662eac490e29c025c030
> >>>> why speculative accesses may cause spurious page fault in kernel space
> >>>> with PTE enabled?  speculative accesses exists anywhere, it does not
> >>>> cause spurious page fault.
> >>> Confirmed by Ruiyang Wu, and even if DBAR(0b11000) is wrong, that
> >>> means another patch's mistake, not this one. This one just keeps the
> >>> old behavior.
> >>> +CC Ruiyang Wu here.
> >> Also from Ruiyang Wu, the information is that speculative accesses may
> >> insert stale TLB, however no page fault exception.
> >>
> >> So adding barrier in set_pte() does not prevent speculative accesses.
> >> And you write patch here, however do not know the actual reason?
> >>
> >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.12-rc3&id=f93f67d06b1023313ef1662eac490e29c025c030
> > I have CCed Ruiyang, whether the description is correct can be judged by him.
>
> There are some problems to add barrier() in set_pte():
>
> 1. There is such issue only for HW ptw enabled and kernel address space,
> is that? Also it may be two heavy to add barrier in set_pte(), comparing
> to do this in flush_cache_vmap().
So adding a barrier in set_pte() may not be the best solution for
performance, but you cannot say it is a wrong solution. And yes, we
can only care the kernel space, which is also the old behavior before
this patch, so set_pte() should be:

static inline void set_pte(pte_t *ptep, pte_t pteval)
{
        WRITE_ONCE(*ptep, pteval);
#ifdef CONFIG_SMP
        if (pte_val(pteval) & _PAGE_GLOBAL)
                DBAR(0b11000); /* o_wrw = 0b11000 */
#endif
}

Putting a dbar unconditionally in set_pte() is my mistake, I'm sorry for  that.

>
> 2. LoongArch is different with other other architectures, two pages are
> included in one TLB entry. If there is two consecutive page mapped and
> memory access, there will page fault for the second memory access. Such
> as:
>     addr1 =percpu_alloc(pagesize);
>     val1 = *(int *)addr1;
>       // With page table walk, addr1 is present and addr2 is pte_none
>       // TLB entry includes valid pte for addr1, invalid pte for addr2
>     addr2 =percpu_alloc(pagesize); // will not flush tlb in first time
>     val2 = *(int *)addr2;
>       // With page table walk, addr1 is present and addr2 is present also
>       // TLB entry includes valid pte for addr1, invalid pte for addr2
>     So there will be page fault when accessing address addr2
>
> There there is the same problem with user address space. By the way,
> there is HW prefetching technology, negative effective of HW prefetching
> technology will be tlb added. So there is potential page fault if memory
> is allocated and accessed in the first time.
As discussed internally, there may be three problems related to
speculative access in detail: 1) a load/store after set_pte() is
prioritized before, which can be prevented by dbar, 2) a instruction
fetch after set_pte() is prioritized before, which can be prevented by
ibar, 3) the buddy tlb problem you described here, if I understand
Ruiyang's explanation correctly this can only be prevented by the
filter in do_page_fault().

From experiments, without the patch "LoongArch: Improve hardware page
table walker", there are about 80 times of spurious page faults during
boot, and increases continually during stress tests. And after that
patch which adds a dbar to set_pte(), we cannot observe spurious page
faults anymore. Of course this doesn't mean 2) and 3) don't exist, but
we can at least say 1) is the main case. On this basis, in "LoongArch:
Improve hardware page table walker" we use a relatively cheap dbar
(compared to ibar) to prevent the main case, and add a filter to
handle 2) and 3). Such a solution is reasonable.


>
> 3. For speculative execution, if it is user address, there is eret from
> syscall. eret will rollback all speculative execution instruction. So it
> is only problem for speculative execution. And how to verify whether it
> is the problem of speculative execution or it is the problem of clause 2?
As described above, if spurious page faults still exist after adding
dbar to set_pte(), it may be a problem of clause 2 (case 3 in my
description), otherwise it is not a problem of clause 2.

At last, this patch itself is attempting to solve the concurrent
problem about _PAGE_GLOBAL, so adding pte_alloc_one_kernel() and
removing the buddy stuff in set_pte() are what it needs. However it
shouldn't touch the logic of dbar in set_pte(), whether "LoongArch:
Improve hardware page table walker" is right or wrong.


Huacai

>
> Regards
> Bibo Mao
>
>
> >
> > Huacai
> >
> >>
> >> Bibo Mao
> >>>
> >>> Huacai
> >>>
> >>>>
> >>>> Obvious you do not it and you write wrong patch.
> >>>>
> >>>>>
> >>>>> Huacai
> >>>>>
> >>>>>>
> >>>>>> Regards
> >>>>>> Bibo Mao
> >>>>>>> Huacai
> >>>>>>>
> >>>>>>> On Mon, Oct 14, 2024 at 11:59 AM Bibo Mao <maobibo@loongson.cn> wrote:
> >>>>>>>>
> >>>>>>>> Unlike general architectures, there are two pages in one TLB entry
> >>>>>>>> on LoongArch system. For kernel space, it requires both two pte
> >>>>>>>> entries with PAGE_GLOBAL bit set, else HW treats it as non-global
> >>>>>>>> tlb, there will be potential problems if tlb entry for kernel space
> >>>>>>>> is not global. Such as fail to flush kernel tlb with function
> >>>>>>>> local_flush_tlb_kernel_range() which only flush tlb with global bit.
> >>>>>>>>
> >>>>>>>> With function kernel_pte_init() added, it can be used to init pte
> >>>>>>>> table when it is created for kernel address space, and the default
> >>>>>>>> initial pte value is PAGE_GLOBAL rather than zero at beginning.
> >>>>>>>>
> >>>>>>>> Kernel address space areas includes fixmap, percpu, vmalloc, kasan
> >>>>>>>> and vmemmap areas set default pte entry with PAGE_GLOBAL set.
> >>>>>>>>
> >>>>>>>> Signed-off-by: Bibo Mao <maobibo@loongson.cn>
> >>>>>>>> ---
> >>>>>>>>      arch/loongarch/include/asm/pgalloc.h | 13 +++++++++++++
> >>>>>>>>      arch/loongarch/include/asm/pgtable.h |  1 +
> >>>>>>>>      arch/loongarch/mm/init.c             |  4 +++-
> >>>>>>>>      arch/loongarch/mm/kasan_init.c       |  4 +++-
> >>>>>>>>      arch/loongarch/mm/pgtable.c          | 22 ++++++++++++++++++++++
> >>>>>>>>      include/linux/mm.h                   |  1 +
> >>>>>>>>      mm/kasan/init.c                      |  8 +++++++-
> >>>>>>>>      mm/sparse-vmemmap.c                  |  5 +++++
> >>>>>>>>      8 files changed, 55 insertions(+), 3 deletions(-)
> >>>>>>>>
> >>>>>>>> diff --git a/arch/loongarch/include/asm/pgalloc.h b/arch/loongarch/include/asm/pgalloc.h
> >>>>>>>> index 4e2d6b7ca2ee..b2698c03dc2c 100644
> >>>>>>>> --- a/arch/loongarch/include/asm/pgalloc.h
> >>>>>>>> +++ b/arch/loongarch/include/asm/pgalloc.h
> >>>>>>>> @@ -10,8 +10,21 @@
> >>>>>>>>
> >>>>>>>>      #define __HAVE_ARCH_PMD_ALLOC_ONE
> >>>>>>>>      #define __HAVE_ARCH_PUD_ALLOC_ONE
> >>>>>>>> +#define __HAVE_ARCH_PTE_ALLOC_ONE_KERNEL
> >>>>>>>>      #include <asm-generic/pgalloc.h>
> >>>>>>>>
> >>>>>>>> +static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm)
> >>>>>>>> +{
> >>>>>>>> +       pte_t *pte;
> >>>>>>>> +
> >>>>>>>> +       pte = (pte_t *) __get_free_page(GFP_KERNEL);
> >>>>>>>> +       if (!pte)
> >>>>>>>> +               return NULL;
> >>>>>>>> +
> >>>>>>>> +       kernel_pte_init(pte);
> >>>>>>>> +       return pte;
> >>>>>>>> +}
> >>>>>>>> +
> >>>>>>>>      static inline void pmd_populate_kernel(struct mm_struct *mm,
> >>>>>>>>                                            pmd_t *pmd, pte_t *pte)
> >>>>>>>>      {
> >>>>>>>> diff --git a/arch/loongarch/include/asm/pgtable.h b/arch/loongarch/include/asm/pgtable.h
> >>>>>>>> index 9965f52ef65b..22e3a8f96213 100644
> >>>>>>>> --- a/arch/loongarch/include/asm/pgtable.h
> >>>>>>>> +++ b/arch/loongarch/include/asm/pgtable.h
> >>>>>>>> @@ -269,6 +269,7 @@ extern void set_pmd_at(struct mm_struct *mm, unsigned long addr, pmd_t *pmdp, pm
> >>>>>>>>      extern void pgd_init(void *addr);
> >>>>>>>>      extern void pud_init(void *addr);
> >>>>>>>>      extern void pmd_init(void *addr);
> >>>>>>>> +extern void kernel_pte_init(void *addr);
> >>>>>>>>
> >>>>>>>>      /*
> >>>>>>>>       * Encode/decode swap entries and swap PTEs. Swap PTEs are all PTEs that
> >>>>>>>> diff --git a/arch/loongarch/mm/init.c b/arch/loongarch/mm/init.c
> >>>>>>>> index 8a87a482c8f4..9f26e933a8a3 100644
> >>>>>>>> --- a/arch/loongarch/mm/init.c
> >>>>>>>> +++ b/arch/loongarch/mm/init.c
> >>>>>>>> @@ -198,9 +198,11 @@ pte_t * __init populate_kernel_pte(unsigned long addr)
> >>>>>>>>             if (!pmd_present(pmdp_get(pmd))) {
> >>>>>>>>                     pte_t *pte;
> >>>>>>>>
> >>>>>>>> -               pte = memblock_alloc(PAGE_SIZE, PAGE_SIZE);
> >>>>>>>> +               pte = memblock_alloc_raw(PAGE_SIZE, PAGE_SIZE);
> >>>>>>>>                     if (!pte)
> >>>>>>>>                             panic("%s: Failed to allocate memory\n", __func__);
> >>>>>>>> +
> >>>>>>>> +               kernel_pte_init(pte);
> >>>>>>>>                     pmd_populate_kernel(&init_mm, pmd, pte);
> >>>>>>>>             }
> >>>>>>>>
> >>>>>>>> diff --git a/arch/loongarch/mm/kasan_init.c b/arch/loongarch/mm/kasan_init.c
> >>>>>>>> index 427d6b1aec09..34988573b0d5 100644
> >>>>>>>> --- a/arch/loongarch/mm/kasan_init.c
> >>>>>>>> +++ b/arch/loongarch/mm/kasan_init.c
> >>>>>>>> @@ -152,6 +152,8 @@ static void __init kasan_pte_populate(pmd_t *pmdp, unsigned long addr,
> >>>>>>>>                     phys_addr_t page_phys = early ?
> >>>>>>>>                                             __pa_symbol(kasan_early_shadow_page)
> >>>>>>>>                                                   : kasan_alloc_zeroed_page(node);
> >>>>>>>> +               if (!early)
> >>>>>>>> +                       kernel_pte_init(__va(page_phys));
> >>>>>>>>                     next = addr + PAGE_SIZE;
> >>>>>>>>                     set_pte(ptep, pfn_pte(__phys_to_pfn(page_phys), PAGE_KERNEL));
> >>>>>>>>             } while (ptep++, addr = next, addr != end && __pte_none(early, ptep_get(ptep)));
> >>>>>>>> @@ -287,7 +289,7 @@ void __init kasan_init(void)
> >>>>>>>>                     set_pte(&kasan_early_shadow_pte[i],
> >>>>>>>>                             pfn_pte(__phys_to_pfn(__pa_symbol(kasan_early_shadow_page)), PAGE_KERNEL_RO));
> >>>>>>>>
> >>>>>>>> -       memset(kasan_early_shadow_page, 0, PAGE_SIZE);
> >>>>>>>> +       kernel_pte_init(kasan_early_shadow_page);
> >>>>>>>>             csr_write64(__pa_symbol(swapper_pg_dir), LOONGARCH_CSR_PGDH);
> >>>>>>>>             local_flush_tlb_all();
> >>>>>>>>
> >>>>>>>> diff --git a/arch/loongarch/mm/pgtable.c b/arch/loongarch/mm/pgtable.c
> >>>>>>>> index eb6a29b491a7..228ffc1db0a3 100644
> >>>>>>>> --- a/arch/loongarch/mm/pgtable.c
> >>>>>>>> +++ b/arch/loongarch/mm/pgtable.c
> >>>>>>>> @@ -38,6 +38,28 @@ pgd_t *pgd_alloc(struct mm_struct *mm)
> >>>>>>>>      }
> >>>>>>>>      EXPORT_SYMBOL_GPL(pgd_alloc);
> >>>>>>>>
> >>>>>>>> +void kernel_pte_init(void *addr)
> >>>>>>>> +{
> >>>>>>>> +       unsigned long *p, *end;
> >>>>>>>> +       unsigned long entry;
> >>>>>>>> +
> >>>>>>>> +       entry = (unsigned long)_PAGE_GLOBAL;
> >>>>>>>> +       p = (unsigned long *)addr;
> >>>>>>>> +       end = p + PTRS_PER_PTE;
> >>>>>>>> +
> >>>>>>>> +       do {
> >>>>>>>> +               p[0] = entry;
> >>>>>>>> +               p[1] = entry;
> >>>>>>>> +               p[2] = entry;
> >>>>>>>> +               p[3] = entry;
> >>>>>>>> +               p[4] = entry;
> >>>>>>>> +               p += 8;
> >>>>>>>> +               p[-3] = entry;
> >>>>>>>> +               p[-2] = entry;
> >>>>>>>> +               p[-1] = entry;
> >>>>>>>> +       } while (p != end);
> >>>>>>>> +}
> >>>>>>>> +
> >>>>>>>>      void pgd_init(void *addr)
> >>>>>>>>      {
> >>>>>>>>             unsigned long *p, *end;
> >>>>>>>> diff --git a/include/linux/mm.h b/include/linux/mm.h
> >>>>>>>> index ecf63d2b0582..6909fe059a2c 100644
> >>>>>>>> --- a/include/linux/mm.h
> >>>>>>>> +++ b/include/linux/mm.h
> >>>>>>>> @@ -3818,6 +3818,7 @@ void *sparse_buffer_alloc(unsigned long size);
> >>>>>>>>      struct page * __populate_section_memmap(unsigned long pfn,
> >>>>>>>>                     unsigned long nr_pages, int nid, struct vmem_altmap *altmap,
> >>>>>>>>                     struct dev_pagemap *pgmap);
> >>>>>>>> +void kernel_pte_init(void *addr);
> >>>>>>>>      void pmd_init(void *addr);
> >>>>>>>>      void pud_init(void *addr);
> >>>>>>>>      pgd_t *vmemmap_pgd_populate(unsigned long addr, int node);
> >>>>>>>> diff --git a/mm/kasan/init.c b/mm/kasan/init.c
> >>>>>>>> index 89895f38f722..ac607c306292 100644
> >>>>>>>> --- a/mm/kasan/init.c
> >>>>>>>> +++ b/mm/kasan/init.c
> >>>>>>>> @@ -106,6 +106,10 @@ static void __ref zero_pte_populate(pmd_t *pmd, unsigned long addr,
> >>>>>>>>             }
> >>>>>>>>      }
> >>>>>>>>
> >>>>>>>> +void __weak __meminit kernel_pte_init(void *addr)
> >>>>>>>> +{
> >>>>>>>> +}
> >>>>>>>> +
> >>>>>>>>      static int __ref zero_pmd_populate(pud_t *pud, unsigned long addr,
> >>>>>>>>                                     unsigned long end)
> >>>>>>>>      {
> >>>>>>>> @@ -126,8 +130,10 @@ static int __ref zero_pmd_populate(pud_t *pud, unsigned long addr,
> >>>>>>>>
> >>>>>>>>                             if (slab_is_available())
> >>>>>>>>                                     p = pte_alloc_one_kernel(&init_mm);
> >>>>>>>> -                       else
> >>>>>>>> +                       else {
> >>>>>>>>                                     p = early_alloc(PAGE_SIZE, NUMA_NO_NODE);
> >>>>>>>> +                               kernel_pte_init(p);
> >>>>>>>> +                       }
> >>>>>>>>                             if (!p)
> >>>>>>>>                                     return -ENOMEM;
> >>>>>>>>
> >>>>>>>> diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
> >>>>>>>> index edcc7a6b0f6f..c0388b2e959d 100644
> >>>>>>>> --- a/mm/sparse-vmemmap.c
> >>>>>>>> +++ b/mm/sparse-vmemmap.c
> >>>>>>>> @@ -184,6 +184,10 @@ static void * __meminit vmemmap_alloc_block_zero(unsigned long size, int node)
> >>>>>>>>             return p;
> >>>>>>>>      }
> >>>>>>>>
> >>>>>>>> +void __weak __meminit kernel_pte_init(void *addr)
> >>>>>>>> +{
> >>>>>>>> +}
> >>>>>>>> +
> >>>>>>>>      pmd_t * __meminit vmemmap_pmd_populate(pud_t *pud, unsigned long addr, int node)
> >>>>>>>>      {
> >>>>>>>>             pmd_t *pmd = pmd_offset(pud, addr);
> >>>>>>>> @@ -191,6 +195,7 @@ pmd_t * __meminit vmemmap_pmd_populate(pud_t *pud, unsigned long addr, int node)
> >>>>>>>>                     void *p = vmemmap_alloc_block_zero(PAGE_SIZE, node);
> >>>>>>>>                     if (!p)
> >>>>>>>>                             return NULL;
> >>>>>>>> +               kernel_pte_init(p);
> >>>>>>>>                     pmd_populate_kernel(&init_mm, pmd, p);
> >>>>>>>>             }
> >>>>>>>>             return pmd;
> >>>>>>>> --
> >>>>>>>> 2.39.3
> >>>>>>>>
> >>>>>>
> >>>>>>
> >>>>
> >>>>
> >>
> >>
>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/3] LoongArch: Set initial pte entry with PAGE_GLOBAL for kernel space
  2024-10-21 10:13                   ` Huacai Chen
@ 2024-10-22  1:39                     ` maobibo
  2024-10-22  1:56                       ` Huacai Chen
  0 siblings, 1 reply; 21+ messages in thread
From: maobibo @ 2024-10-22  1:39 UTC (permalink / raw)
  To: Huacai Chen
  Cc: wuruiyang, Andrey Ryabinin, Andrew Morton, David Hildenbrand,
	Barry Song, loongarch, linux-kernel, kasan-dev, linux-mm



On 2024/10/21 下午6:13, Huacai Chen wrote:
> On Mon, Oct 21, 2024 at 9:23 AM maobibo <maobibo@loongson.cn> wrote:
>>
>>
>>
>> On 2024/10/18 下午2:32, Huacai Chen wrote:
>>> On Fri, Oct 18, 2024 at 2:23 PM maobibo <maobibo@loongson.cn> wrote:
>>>>
>>>>
>>>>
>>>> On 2024/10/18 下午12:23, Huacai Chen wrote:
>>>>> On Fri, Oct 18, 2024 at 12:16 PM maobibo <maobibo@loongson.cn> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 2024/10/18 下午12:11, Huacai Chen wrote:
>>>>>>> On Fri, Oct 18, 2024 at 11:44 AM maobibo <maobibo@loongson.cn> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2024/10/18 上午11:14, Huacai Chen wrote:
>>>>>>>>> Hi, Bibo,
>>>>>>>>>
>>>>>>>>> I applied this patch but drop the part of arch/loongarch/mm/kasan_init.c:
>>>>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson.git/commit/?h=loongarch-next&id=15832255e84494853f543b4c70ced50afc403067
>>>>>>>>>
>>>>>>>>> Because kernel_pte_init() should operate on page-table pages, not on
>>>>>>>>> data pages. You have already handle page-table page in
>>>>>>>>> mm/kasan/init.c, and if we don't drop the modification on data pages
>>>>>>>>> in arch/loongarch/mm/kasan_init.c, the kernel fail to boot if KASAN is
>>>>>>>>> enabled.
>>>>>>>>>
>>>>>>>> static inline void set_pte(pte_t *ptep, pte_t pteval)
>>>>>>>>       {
>>>>>>>>             WRITE_ONCE(*ptep, pteval);
>>>>>>>> -
>>>>>>>> -       if (pte_val(pteval) & _PAGE_GLOBAL) {
>>>>>>>> -               pte_t *buddy = ptep_buddy(ptep);
>>>>>>>> -               /*
>>>>>>>> -                * Make sure the buddy is global too (if it's !none,
>>>>>>>> -                * it better already be global)
>>>>>>>> -                */
>>>>>>>> -               if (pte_none(ptep_get(buddy))) {
>>>>>>>> -#ifdef CONFIG_SMP
>>>>>>>> -                       /*
>>>>>>>> -                        * For SMP, multiple CPUs can race, so we need
>>>>>>>> -                        * to do this atomically.
>>>>>>>> -                        */
>>>>>>>> -                       __asm__ __volatile__(
>>>>>>>> -                       __AMOR "$zero, %[global], %[buddy] \n"
>>>>>>>> -                       : [buddy] "+ZB" (buddy->pte)
>>>>>>>> -                       : [global] "r" (_PAGE_GLOBAL)
>>>>>>>> -                       : "memory");
>>>>>>>> -
>>>>>>>> -                       DBAR(0b11000); /* o_wrw = 0b11000 */
>>>>>>>> -#else /* !CONFIG_SMP */
>>>>>>>> -                       WRITE_ONCE(*buddy, __pte(pte_val(ptep_get(buddy)) | _PAGE_GLOBAL));
>>>>>>>> -#endif /* CONFIG_SMP */
>>>>>>>> -               }
>>>>>>>> -       }
>>>>>>>> +       DBAR(0b11000); /* o_wrw = 0b11000 */
>>>>>>>>       }
>>>>>>>>
>>>>>>>> No, please hold on. This issue exists about twenty years, Do we need be
>>>>>>>> in such a hurry now?
>>>>>>>>
>>>>>>>> why is DBAR(0b11000) added in set_pte()?
>>>>>>> It exists before, not added by this patch. The reason is explained in
>>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.12-rc3&id=f93f67d06b1023313ef1662eac490e29c025c030
>>>>>> why speculative accesses may cause spurious page fault in kernel space
>>>>>> with PTE enabled?  speculative accesses exists anywhere, it does not
>>>>>> cause spurious page fault.
>>>>> Confirmed by Ruiyang Wu, and even if DBAR(0b11000) is wrong, that
>>>>> means another patch's mistake, not this one. This one just keeps the
>>>>> old behavior.
>>>>> +CC Ruiyang Wu here.
>>>> Also from Ruiyang Wu, the information is that speculative accesses may
>>>> insert stale TLB, however no page fault exception.
>>>>
>>>> So adding barrier in set_pte() does not prevent speculative accesses.
>>>> And you write patch here, however do not know the actual reason?
>>>>
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.12-rc3&id=f93f67d06b1023313ef1662eac490e29c025c030
>>> I have CCed Ruiyang, whether the description is correct can be judged by him.
>>
>> There are some problems to add barrier() in set_pte():
>>
>> 1. There is such issue only for HW ptw enabled and kernel address space,
>> is that? Also it may be two heavy to add barrier in set_pte(), comparing
>> to do this in flush_cache_vmap().
> So adding a barrier in set_pte() may not be the best solution for
> performance, but you cannot say it is a wrong solution. And yes, we
> can only care the kernel space, which is also the old behavior before
> this patch, so set_pte() should be:
> 
> static inline void set_pte(pte_t *ptep, pte_t pteval)
> {
>          WRITE_ONCE(*ptep, pteval);
> #ifdef CONFIG_SMP
>          if (pte_val(pteval) & _PAGE_GLOBAL)
cpu_has_ptw seems also need here, if it is only for hw page walk.
>                  DBAR(0b11000); /* o_wrw = 0b11000 */
> #endif
> }
> 
> Putting a dbar unconditionally in set_pte() is my mistake, I'm sorry for  that.
> 
>>
>> 2. LoongArch is different with other other architectures, two pages are
>> included in one TLB entry. If there is two consecutive page mapped and
>> memory access, there will page fault for the second memory access. Such
>> as:
>>      addr1 =percpu_alloc(pagesize);
>>      val1 = *(int *)addr1;
>>        // With page table walk, addr1 is present and addr2 is pte_none
>>        // TLB entry includes valid pte for addr1, invalid pte for addr2
>>      addr2 =percpu_alloc(pagesize); // will not flush tlb in first time
>>      val2 = *(int *)addr2;
>>        // With page table walk, addr1 is present and addr2 is present also
>>        // TLB entry includes valid pte for addr1, invalid pte for addr2
>>      So there will be page fault when accessing address addr2
>>
>> There there is the same problem with user address space. By the way,
>> there is HW prefetching technology, negative effective of HW prefetching
>> technology will be tlb added. So there is potential page fault if memory
>> is allocated and accessed in the first time.
> As discussed internally, there may be three problems related to
> speculative access in detail: 1) a load/store after set_pte() is
> prioritized before, which can be prevented by dbar, 2) a instruction
> fetch after set_pte() is prioritized before, which can be prevented by
> ibar, 3) the buddy tlb problem you described here, if I understand
> Ruiyang's explanation correctly this can only be prevented by the
> filter in do_page_fault().
> 
>  From experiments, without the patch "LoongArch: Improve hardware page
> table walker", there are about 80 times of spurious page faults during
> boot, and increases continually during stress tests. And after that
> patch which adds a dbar to set_pte(), we cannot observe spurious page
> faults anymore. Of course this doesn't mean 2) and 3) don't exist, but
Good experiment result. Could you share me code about page fault 
counting and test cases?

> we can at least say 1) is the main case. On this basis, in "LoongArch:
> Improve hardware page table walker" we use a relatively cheap dbar
> (compared to ibar) to prevent the main case, and add a filter to
> handle 2) and 3). Such a solution is reasonable.
> 
> 
>>
>> 3. For speculative execution, if it is user address, there is eret from
>> syscall. eret will rollback all speculative execution instruction. So it
>> is only problem for speculative execution. And how to verify whether it
>> is the problem of speculative execution or it is the problem of clause 2?
> As described above, if spurious page faults still exist after adding
> dbar to set_pte(), it may be a problem of clause 2 (case 3 in my
> description), otherwise it is not a problem of clause 2.
> 
> At last, this patch itself is attempting to solve the concurrent
> problem about _PAGE_GLOBAL, so adding pte_alloc_one_kernel() and
> removing the buddy stuff in set_pte() are what it needs. However it
> shouldn't touch the logic of dbar in set_pte(), whether "LoongArch:
> Improve hardware page table walker" is right or wrong.
yes, I agree. We can discuss set_pte() issue in later. Simple for this 
patch to solve concurrent problem, it is ok
https://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson.git/diff/mm/kasan/init.c?h=loongarch-next&id=15832255e84494853f543b4c70ced50afc403067

Regards
Bibo Mao
> 
> 
> Huacai
> 
>>
>> Regards
>> Bibo Mao
>>
>>
>>>
>>> Huacai
>>>
>>>>
>>>> Bibo Mao
>>>>>
>>>>> Huacai
>>>>>
>>>>>>
>>>>>> Obvious you do not it and you write wrong patch.
>>>>>>
>>>>>>>
>>>>>>> Huacai
>>>>>>>
>>>>>>>>
>>>>>>>> Regards
>>>>>>>> Bibo Mao
>>>>>>>>> Huacai
>>>>>>>>>
>>>>>>>>> On Mon, Oct 14, 2024 at 11:59 AM Bibo Mao <maobibo@loongson.cn> wrote:
>>>>>>>>>>
>>>>>>>>>> Unlike general architectures, there are two pages in one TLB entry
>>>>>>>>>> on LoongArch system. For kernel space, it requires both two pte
>>>>>>>>>> entries with PAGE_GLOBAL bit set, else HW treats it as non-global
>>>>>>>>>> tlb, there will be potential problems if tlb entry for kernel space
>>>>>>>>>> is not global. Such as fail to flush kernel tlb with function
>>>>>>>>>> local_flush_tlb_kernel_range() which only flush tlb with global bit.
>>>>>>>>>>
>>>>>>>>>> With function kernel_pte_init() added, it can be used to init pte
>>>>>>>>>> table when it is created for kernel address space, and the default
>>>>>>>>>> initial pte value is PAGE_GLOBAL rather than zero at beginning.
>>>>>>>>>>
>>>>>>>>>> Kernel address space areas includes fixmap, percpu, vmalloc, kasan
>>>>>>>>>> and vmemmap areas set default pte entry with PAGE_GLOBAL set.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Bibo Mao <maobibo@loongson.cn>
>>>>>>>>>> ---
>>>>>>>>>>       arch/loongarch/include/asm/pgalloc.h | 13 +++++++++++++
>>>>>>>>>>       arch/loongarch/include/asm/pgtable.h |  1 +
>>>>>>>>>>       arch/loongarch/mm/init.c             |  4 +++-
>>>>>>>>>>       arch/loongarch/mm/kasan_init.c       |  4 +++-
>>>>>>>>>>       arch/loongarch/mm/pgtable.c          | 22 ++++++++++++++++++++++
>>>>>>>>>>       include/linux/mm.h                   |  1 +
>>>>>>>>>>       mm/kasan/init.c                      |  8 +++++++-
>>>>>>>>>>       mm/sparse-vmemmap.c                  |  5 +++++
>>>>>>>>>>       8 files changed, 55 insertions(+), 3 deletions(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/arch/loongarch/include/asm/pgalloc.h b/arch/loongarch/include/asm/pgalloc.h
>>>>>>>>>> index 4e2d6b7ca2ee..b2698c03dc2c 100644
>>>>>>>>>> --- a/arch/loongarch/include/asm/pgalloc.h
>>>>>>>>>> +++ b/arch/loongarch/include/asm/pgalloc.h
>>>>>>>>>> @@ -10,8 +10,21 @@
>>>>>>>>>>
>>>>>>>>>>       #define __HAVE_ARCH_PMD_ALLOC_ONE
>>>>>>>>>>       #define __HAVE_ARCH_PUD_ALLOC_ONE
>>>>>>>>>> +#define __HAVE_ARCH_PTE_ALLOC_ONE_KERNEL
>>>>>>>>>>       #include <asm-generic/pgalloc.h>
>>>>>>>>>>
>>>>>>>>>> +static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm)
>>>>>>>>>> +{
>>>>>>>>>> +       pte_t *pte;
>>>>>>>>>> +
>>>>>>>>>> +       pte = (pte_t *) __get_free_page(GFP_KERNEL);
>>>>>>>>>> +       if (!pte)
>>>>>>>>>> +               return NULL;
>>>>>>>>>> +
>>>>>>>>>> +       kernel_pte_init(pte);
>>>>>>>>>> +       return pte;
>>>>>>>>>> +}
>>>>>>>>>> +
>>>>>>>>>>       static inline void pmd_populate_kernel(struct mm_struct *mm,
>>>>>>>>>>                                             pmd_t *pmd, pte_t *pte)
>>>>>>>>>>       {
>>>>>>>>>> diff --git a/arch/loongarch/include/asm/pgtable.h b/arch/loongarch/include/asm/pgtable.h
>>>>>>>>>> index 9965f52ef65b..22e3a8f96213 100644
>>>>>>>>>> --- a/arch/loongarch/include/asm/pgtable.h
>>>>>>>>>> +++ b/arch/loongarch/include/asm/pgtable.h
>>>>>>>>>> @@ -269,6 +269,7 @@ extern void set_pmd_at(struct mm_struct *mm, unsigned long addr, pmd_t *pmdp, pm
>>>>>>>>>>       extern void pgd_init(void *addr);
>>>>>>>>>>       extern void pud_init(void *addr);
>>>>>>>>>>       extern void pmd_init(void *addr);
>>>>>>>>>> +extern void kernel_pte_init(void *addr);
>>>>>>>>>>
>>>>>>>>>>       /*
>>>>>>>>>>        * Encode/decode swap entries and swap PTEs. Swap PTEs are all PTEs that
>>>>>>>>>> diff --git a/arch/loongarch/mm/init.c b/arch/loongarch/mm/init.c
>>>>>>>>>> index 8a87a482c8f4..9f26e933a8a3 100644
>>>>>>>>>> --- a/arch/loongarch/mm/init.c
>>>>>>>>>> +++ b/arch/loongarch/mm/init.c
>>>>>>>>>> @@ -198,9 +198,11 @@ pte_t * __init populate_kernel_pte(unsigned long addr)
>>>>>>>>>>              if (!pmd_present(pmdp_get(pmd))) {
>>>>>>>>>>                      pte_t *pte;
>>>>>>>>>>
>>>>>>>>>> -               pte = memblock_alloc(PAGE_SIZE, PAGE_SIZE);
>>>>>>>>>> +               pte = memblock_alloc_raw(PAGE_SIZE, PAGE_SIZE);
>>>>>>>>>>                      if (!pte)
>>>>>>>>>>                              panic("%s: Failed to allocate memory\n", __func__);
>>>>>>>>>> +
>>>>>>>>>> +               kernel_pte_init(pte);
>>>>>>>>>>                      pmd_populate_kernel(&init_mm, pmd, pte);
>>>>>>>>>>              }
>>>>>>>>>>
>>>>>>>>>> diff --git a/arch/loongarch/mm/kasan_init.c b/arch/loongarch/mm/kasan_init.c
>>>>>>>>>> index 427d6b1aec09..34988573b0d5 100644
>>>>>>>>>> --- a/arch/loongarch/mm/kasan_init.c
>>>>>>>>>> +++ b/arch/loongarch/mm/kasan_init.c
>>>>>>>>>> @@ -152,6 +152,8 @@ static void __init kasan_pte_populate(pmd_t *pmdp, unsigned long addr,
>>>>>>>>>>                      phys_addr_t page_phys = early ?
>>>>>>>>>>                                              __pa_symbol(kasan_early_shadow_page)
>>>>>>>>>>                                                    : kasan_alloc_zeroed_page(node);
>>>>>>>>>> +               if (!early)
>>>>>>>>>> +                       kernel_pte_init(__va(page_phys));
>>>>>>>>>>                      next = addr + PAGE_SIZE;
>>>>>>>>>>                      set_pte(ptep, pfn_pte(__phys_to_pfn(page_phys), PAGE_KERNEL));
>>>>>>>>>>              } while (ptep++, addr = next, addr != end && __pte_none(early, ptep_get(ptep)));
>>>>>>>>>> @@ -287,7 +289,7 @@ void __init kasan_init(void)
>>>>>>>>>>                      set_pte(&kasan_early_shadow_pte[i],
>>>>>>>>>>                              pfn_pte(__phys_to_pfn(__pa_symbol(kasan_early_shadow_page)), PAGE_KERNEL_RO));
>>>>>>>>>>
>>>>>>>>>> -       memset(kasan_early_shadow_page, 0, PAGE_SIZE);
>>>>>>>>>> +       kernel_pte_init(kasan_early_shadow_page);
>>>>>>>>>>              csr_write64(__pa_symbol(swapper_pg_dir), LOONGARCH_CSR_PGDH);
>>>>>>>>>>              local_flush_tlb_all();
>>>>>>>>>>
>>>>>>>>>> diff --git a/arch/loongarch/mm/pgtable.c b/arch/loongarch/mm/pgtable.c
>>>>>>>>>> index eb6a29b491a7..228ffc1db0a3 100644
>>>>>>>>>> --- a/arch/loongarch/mm/pgtable.c
>>>>>>>>>> +++ b/arch/loongarch/mm/pgtable.c
>>>>>>>>>> @@ -38,6 +38,28 @@ pgd_t *pgd_alloc(struct mm_struct *mm)
>>>>>>>>>>       }
>>>>>>>>>>       EXPORT_SYMBOL_GPL(pgd_alloc);
>>>>>>>>>>
>>>>>>>>>> +void kernel_pte_init(void *addr)
>>>>>>>>>> +{
>>>>>>>>>> +       unsigned long *p, *end;
>>>>>>>>>> +       unsigned long entry;
>>>>>>>>>> +
>>>>>>>>>> +       entry = (unsigned long)_PAGE_GLOBAL;
>>>>>>>>>> +       p = (unsigned long *)addr;
>>>>>>>>>> +       end = p + PTRS_PER_PTE;
>>>>>>>>>> +
>>>>>>>>>> +       do {
>>>>>>>>>> +               p[0] = entry;
>>>>>>>>>> +               p[1] = entry;
>>>>>>>>>> +               p[2] = entry;
>>>>>>>>>> +               p[3] = entry;
>>>>>>>>>> +               p[4] = entry;
>>>>>>>>>> +               p += 8;
>>>>>>>>>> +               p[-3] = entry;
>>>>>>>>>> +               p[-2] = entry;
>>>>>>>>>> +               p[-1] = entry;
>>>>>>>>>> +       } while (p != end);
>>>>>>>>>> +}
>>>>>>>>>> +
>>>>>>>>>>       void pgd_init(void *addr)
>>>>>>>>>>       {
>>>>>>>>>>              unsigned long *p, *end;
>>>>>>>>>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>>>>>>>>>> index ecf63d2b0582..6909fe059a2c 100644
>>>>>>>>>> --- a/include/linux/mm.h
>>>>>>>>>> +++ b/include/linux/mm.h
>>>>>>>>>> @@ -3818,6 +3818,7 @@ void *sparse_buffer_alloc(unsigned long size);
>>>>>>>>>>       struct page * __populate_section_memmap(unsigned long pfn,
>>>>>>>>>>                      unsigned long nr_pages, int nid, struct vmem_altmap *altmap,
>>>>>>>>>>                      struct dev_pagemap *pgmap);
>>>>>>>>>> +void kernel_pte_init(void *addr);
>>>>>>>>>>       void pmd_init(void *addr);
>>>>>>>>>>       void pud_init(void *addr);
>>>>>>>>>>       pgd_t *vmemmap_pgd_populate(unsigned long addr, int node);
>>>>>>>>>> diff --git a/mm/kasan/init.c b/mm/kasan/init.c
>>>>>>>>>> index 89895f38f722..ac607c306292 100644
>>>>>>>>>> --- a/mm/kasan/init.c
>>>>>>>>>> +++ b/mm/kasan/init.c
>>>>>>>>>> @@ -106,6 +106,10 @@ static void __ref zero_pte_populate(pmd_t *pmd, unsigned long addr,
>>>>>>>>>>              }
>>>>>>>>>>       }
>>>>>>>>>>
>>>>>>>>>> +void __weak __meminit kernel_pte_init(void *addr)
>>>>>>>>>> +{
>>>>>>>>>> +}
>>>>>>>>>> +
>>>>>>>>>>       static int __ref zero_pmd_populate(pud_t *pud, unsigned long addr,
>>>>>>>>>>                                      unsigned long end)
>>>>>>>>>>       {
>>>>>>>>>> @@ -126,8 +130,10 @@ static int __ref zero_pmd_populate(pud_t *pud, unsigned long addr,
>>>>>>>>>>
>>>>>>>>>>                              if (slab_is_available())
>>>>>>>>>>                                      p = pte_alloc_one_kernel(&init_mm);
>>>>>>>>>> -                       else
>>>>>>>>>> +                       else {
>>>>>>>>>>                                      p = early_alloc(PAGE_SIZE, NUMA_NO_NODE);
>>>>>>>>>> +                               kernel_pte_init(p);
>>>>>>>>>> +                       }
>>>>>>>>>>                              if (!p)
>>>>>>>>>>                                      return -ENOMEM;
>>>>>>>>>>
>>>>>>>>>> diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
>>>>>>>>>> index edcc7a6b0f6f..c0388b2e959d 100644
>>>>>>>>>> --- a/mm/sparse-vmemmap.c
>>>>>>>>>> +++ b/mm/sparse-vmemmap.c
>>>>>>>>>> @@ -184,6 +184,10 @@ static void * __meminit vmemmap_alloc_block_zero(unsigned long size, int node)
>>>>>>>>>>              return p;
>>>>>>>>>>       }
>>>>>>>>>>
>>>>>>>>>> +void __weak __meminit kernel_pte_init(void *addr)
>>>>>>>>>> +{
>>>>>>>>>> +}
>>>>>>>>>> +
>>>>>>>>>>       pmd_t * __meminit vmemmap_pmd_populate(pud_t *pud, unsigned long addr, int node)
>>>>>>>>>>       {
>>>>>>>>>>              pmd_t *pmd = pmd_offset(pud, addr);
>>>>>>>>>> @@ -191,6 +195,7 @@ pmd_t * __meminit vmemmap_pmd_populate(pud_t *pud, unsigned long addr, int node)
>>>>>>>>>>                      void *p = vmemmap_alloc_block_zero(PAGE_SIZE, node);
>>>>>>>>>>                      if (!p)
>>>>>>>>>>                              return NULL;
>>>>>>>>>> +               kernel_pte_init(p);
>>>>>>>>>>                      pmd_populate_kernel(&init_mm, pmd, p);
>>>>>>>>>>              }
>>>>>>>>>>              return pmd;
>>>>>>>>>> --
>>>>>>>>>> 2.39.3
>>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>>
>>>>
>>>>
>>



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/3] LoongArch: Set initial pte entry with PAGE_GLOBAL for kernel space
  2024-10-22  1:39                     ` maobibo
@ 2024-10-22  1:56                       ` Huacai Chen
  0 siblings, 0 replies; 21+ messages in thread
From: Huacai Chen @ 2024-10-22  1:56 UTC (permalink / raw)
  To: maobibo
  Cc: wuruiyang, Andrey Ryabinin, Andrew Morton, David Hildenbrand,
	Barry Song, loongarch, linux-kernel, kasan-dev, linux-mm

On Tue, Oct 22, 2024 at 9:40 AM maobibo <maobibo@loongson.cn> wrote:
>
>
>
> On 2024/10/21 下午6:13, Huacai Chen wrote:
> > On Mon, Oct 21, 2024 at 9:23 AM maobibo <maobibo@loongson.cn> wrote:
> >>
> >>
> >>
> >> On 2024/10/18 下午2:32, Huacai Chen wrote:
> >>> On Fri, Oct 18, 2024 at 2:23 PM maobibo <maobibo@loongson.cn> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 2024/10/18 下午12:23, Huacai Chen wrote:
> >>>>> On Fri, Oct 18, 2024 at 12:16 PM maobibo <maobibo@loongson.cn> wrote:
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On 2024/10/18 下午12:11, Huacai Chen wrote:
> >>>>>>> On Fri, Oct 18, 2024 at 11:44 AM maobibo <maobibo@loongson.cn> wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On 2024/10/18 上午11:14, Huacai Chen wrote:
> >>>>>>>>> Hi, Bibo,
> >>>>>>>>>
> >>>>>>>>> I applied this patch but drop the part of arch/loongarch/mm/kasan_init.c:
> >>>>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson.git/commit/?h=loongarch-next&id=15832255e84494853f543b4c70ced50afc403067
> >>>>>>>>>
> >>>>>>>>> Because kernel_pte_init() should operate on page-table pages, not on
> >>>>>>>>> data pages. You have already handle page-table page in
> >>>>>>>>> mm/kasan/init.c, and if we don't drop the modification on data pages
> >>>>>>>>> in arch/loongarch/mm/kasan_init.c, the kernel fail to boot if KASAN is
> >>>>>>>>> enabled.
> >>>>>>>>>
> >>>>>>>> static inline void set_pte(pte_t *ptep, pte_t pteval)
> >>>>>>>>       {
> >>>>>>>>             WRITE_ONCE(*ptep, pteval);
> >>>>>>>> -
> >>>>>>>> -       if (pte_val(pteval) & _PAGE_GLOBAL) {
> >>>>>>>> -               pte_t *buddy = ptep_buddy(ptep);
> >>>>>>>> -               /*
> >>>>>>>> -                * Make sure the buddy is global too (if it's !none,
> >>>>>>>> -                * it better already be global)
> >>>>>>>> -                */
> >>>>>>>> -               if (pte_none(ptep_get(buddy))) {
> >>>>>>>> -#ifdef CONFIG_SMP
> >>>>>>>> -                       /*
> >>>>>>>> -                        * For SMP, multiple CPUs can race, so we need
> >>>>>>>> -                        * to do this atomically.
> >>>>>>>> -                        */
> >>>>>>>> -                       __asm__ __volatile__(
> >>>>>>>> -                       __AMOR "$zero, %[global], %[buddy] \n"
> >>>>>>>> -                       : [buddy] "+ZB" (buddy->pte)
> >>>>>>>> -                       : [global] "r" (_PAGE_GLOBAL)
> >>>>>>>> -                       : "memory");
> >>>>>>>> -
> >>>>>>>> -                       DBAR(0b11000); /* o_wrw = 0b11000 */
> >>>>>>>> -#else /* !CONFIG_SMP */
> >>>>>>>> -                       WRITE_ONCE(*buddy, __pte(pte_val(ptep_get(buddy)) | _PAGE_GLOBAL));
> >>>>>>>> -#endif /* CONFIG_SMP */
> >>>>>>>> -               }
> >>>>>>>> -       }
> >>>>>>>> +       DBAR(0b11000); /* o_wrw = 0b11000 */
> >>>>>>>>       }
> >>>>>>>>
> >>>>>>>> No, please hold on. This issue exists about twenty years, Do we need be
> >>>>>>>> in such a hurry now?
> >>>>>>>>
> >>>>>>>> why is DBAR(0b11000) added in set_pte()?
> >>>>>>> It exists before, not added by this patch. The reason is explained in
> >>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.12-rc3&id=f93f67d06b1023313ef1662eac490e29c025c030
> >>>>>> why speculative accesses may cause spurious page fault in kernel space
> >>>>>> with PTE enabled?  speculative accesses exists anywhere, it does not
> >>>>>> cause spurious page fault.
> >>>>> Confirmed by Ruiyang Wu, and even if DBAR(0b11000) is wrong, that
> >>>>> means another patch's mistake, not this one. This one just keeps the
> >>>>> old behavior.
> >>>>> +CC Ruiyang Wu here.
> >>>> Also from Ruiyang Wu, the information is that speculative accesses may
> >>>> insert stale TLB, however no page fault exception.
> >>>>
> >>>> So adding barrier in set_pte() does not prevent speculative accesses.
> >>>> And you write patch here, however do not know the actual reason?
> >>>>
> >>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.12-rc3&id=f93f67d06b1023313ef1662eac490e29c025c030
> >>> I have CCed Ruiyang, whether the description is correct can be judged by him.
> >>
> >> There are some problems to add barrier() in set_pte():
> >>
> >> 1. There is such issue only for HW ptw enabled and kernel address space,
> >> is that? Also it may be two heavy to add barrier in set_pte(), comparing
> >> to do this in flush_cache_vmap().
> > So adding a barrier in set_pte() may not be the best solution for
> > performance, but you cannot say it is a wrong solution. And yes, we
> > can only care the kernel space, which is also the old behavior before
> > this patch, so set_pte() should be:
> >
> > static inline void set_pte(pte_t *ptep, pte_t pteval)
> > {
> >          WRITE_ONCE(*ptep, pteval);
> > #ifdef CONFIG_SMP
> >          if (pte_val(pteval) & _PAGE_GLOBAL)
> cpu_has_ptw seems also need here, if it is only for hw page walk.
> >                  DBAR(0b11000); /* o_wrw = 0b11000 */
> > #endif
> > }
> >
> > Putting a dbar unconditionally in set_pte() is my mistake, I'm sorry for  that.
> >
> >>
> >> 2. LoongArch is different with other other architectures, two pages are
> >> included in one TLB entry. If there is two consecutive page mapped and
> >> memory access, there will page fault for the second memory access. Such
> >> as:
> >>      addr1 =percpu_alloc(pagesize);
> >>      val1 = *(int *)addr1;
> >>        // With page table walk, addr1 is present and addr2 is pte_none
> >>        // TLB entry includes valid pte for addr1, invalid pte for addr2
> >>      addr2 =percpu_alloc(pagesize); // will not flush tlb in first time
> >>      val2 = *(int *)addr2;
> >>        // With page table walk, addr1 is present and addr2 is present also
> >>        // TLB entry includes valid pte for addr1, invalid pte for addr2
> >>      So there will be page fault when accessing address addr2
> >>
> >> There there is the same problem with user address space. By the way,
> >> there is HW prefetching technology, negative effective of HW prefetching
> >> technology will be tlb added. So there is potential page fault if memory
> >> is allocated and accessed in the first time.
> > As discussed internally, there may be three problems related to
> > speculative access in detail: 1) a load/store after set_pte() is
> > prioritized before, which can be prevented by dbar, 2) a instruction
> > fetch after set_pte() is prioritized before, which can be prevented by
> > ibar, 3) the buddy tlb problem you described here, if I understand
> > Ruiyang's explanation correctly this can only be prevented by the
> > filter in do_page_fault().
> >
> >  From experiments, without the patch "LoongArch: Improve hardware page
> > table walker", there are about 80 times of spurious page faults during
> > boot, and increases continually during stress tests. And after that
> > patch which adds a dbar to set_pte(), we cannot observe spurious page
> > faults anymore. Of course this doesn't mean 2) and 3) don't exist, but
> Good experiment result. Could you share me code about page fault
> counting and test cases?
Counting method:
1, Add a simple printk at the beginning of spurious_fault(), and count
the number of printk from dmesg.
2, Test case: boot Fedora to desktop, and then run kernel building
work with "make -j8" in the system.

Huacai

>
> > we can at least say 1) is the main case. On this basis, in "LoongArch:
> > Improve hardware page table walker" we use a relatively cheap dbar
> > (compared to ibar) to prevent the main case, and add a filter to
> > handle 2) and 3). Such a solution is reasonable.
> >
> >
> >>
> >> 3. For speculative execution, if it is user address, there is eret from
> >> syscall. eret will rollback all speculative execution instruction. So it
> >> is only problem for speculative execution. And how to verify whether it
> >> is the problem of speculative execution or it is the problem of clause 2?
> > As described above, if spurious page faults still exist after adding
> > dbar to set_pte(), it may be a problem of clause 2 (case 3 in my
> > description), otherwise it is not a problem of clause 2.
> >
> > At last, this patch itself is attempting to solve the concurrent
> > problem about _PAGE_GLOBAL, so adding pte_alloc_one_kernel() and
> > removing the buddy stuff in set_pte() are what it needs. However it
> > shouldn't touch the logic of dbar in set_pte(), whether "LoongArch:
> > Improve hardware page table walker" is right or wrong.
> yes, I agree. We can discuss set_pte() issue in later. Simple for this
> patch to solve concurrent problem, it is ok
> https://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson.git/diff/mm/kasan/init.c?h=loongarch-next&id=15832255e84494853f543b4c70ced50afc403067
>
> Regards
> Bibo Mao
> >
> >
> > Huacai
> >
> >>
> >> Regards
> >> Bibo Mao
> >>
> >>
> >>>
> >>> Huacai
> >>>
> >>>>
> >>>> Bibo Mao
> >>>>>
> >>>>> Huacai
> >>>>>
> >>>>>>
> >>>>>> Obvious you do not it and you write wrong patch.
> >>>>>>
> >>>>>>>
> >>>>>>> Huacai
> >>>>>>>
> >>>>>>>>
> >>>>>>>> Regards
> >>>>>>>> Bibo Mao
> >>>>>>>>> Huacai
> >>>>>>>>>
> >>>>>>>>> On Mon, Oct 14, 2024 at 11:59 AM Bibo Mao <maobibo@loongson.cn> wrote:
> >>>>>>>>>>
> >>>>>>>>>> Unlike general architectures, there are two pages in one TLB entry
> >>>>>>>>>> on LoongArch system. For kernel space, it requires both two pte
> >>>>>>>>>> entries with PAGE_GLOBAL bit set, else HW treats it as non-global
> >>>>>>>>>> tlb, there will be potential problems if tlb entry for kernel space
> >>>>>>>>>> is not global. Such as fail to flush kernel tlb with function
> >>>>>>>>>> local_flush_tlb_kernel_range() which only flush tlb with global bit.
> >>>>>>>>>>
> >>>>>>>>>> With function kernel_pte_init() added, it can be used to init pte
> >>>>>>>>>> table when it is created for kernel address space, and the default
> >>>>>>>>>> initial pte value is PAGE_GLOBAL rather than zero at beginning.
> >>>>>>>>>>
> >>>>>>>>>> Kernel address space areas includes fixmap, percpu, vmalloc, kasan
> >>>>>>>>>> and vmemmap areas set default pte entry with PAGE_GLOBAL set.
> >>>>>>>>>>
> >>>>>>>>>> Signed-off-by: Bibo Mao <maobibo@loongson.cn>
> >>>>>>>>>> ---
> >>>>>>>>>>       arch/loongarch/include/asm/pgalloc.h | 13 +++++++++++++
> >>>>>>>>>>       arch/loongarch/include/asm/pgtable.h |  1 +
> >>>>>>>>>>       arch/loongarch/mm/init.c             |  4 +++-
> >>>>>>>>>>       arch/loongarch/mm/kasan_init.c       |  4 +++-
> >>>>>>>>>>       arch/loongarch/mm/pgtable.c          | 22 ++++++++++++++++++++++
> >>>>>>>>>>       include/linux/mm.h                   |  1 +
> >>>>>>>>>>       mm/kasan/init.c                      |  8 +++++++-
> >>>>>>>>>>       mm/sparse-vmemmap.c                  |  5 +++++
> >>>>>>>>>>       8 files changed, 55 insertions(+), 3 deletions(-)
> >>>>>>>>>>
> >>>>>>>>>> diff --git a/arch/loongarch/include/asm/pgalloc.h b/arch/loongarch/include/asm/pgalloc.h
> >>>>>>>>>> index 4e2d6b7ca2ee..b2698c03dc2c 100644
> >>>>>>>>>> --- a/arch/loongarch/include/asm/pgalloc.h
> >>>>>>>>>> +++ b/arch/loongarch/include/asm/pgalloc.h
> >>>>>>>>>> @@ -10,8 +10,21 @@
> >>>>>>>>>>
> >>>>>>>>>>       #define __HAVE_ARCH_PMD_ALLOC_ONE
> >>>>>>>>>>       #define __HAVE_ARCH_PUD_ALLOC_ONE
> >>>>>>>>>> +#define __HAVE_ARCH_PTE_ALLOC_ONE_KERNEL
> >>>>>>>>>>       #include <asm-generic/pgalloc.h>
> >>>>>>>>>>
> >>>>>>>>>> +static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm)
> >>>>>>>>>> +{
> >>>>>>>>>> +       pte_t *pte;
> >>>>>>>>>> +
> >>>>>>>>>> +       pte = (pte_t *) __get_free_page(GFP_KERNEL);
> >>>>>>>>>> +       if (!pte)
> >>>>>>>>>> +               return NULL;
> >>>>>>>>>> +
> >>>>>>>>>> +       kernel_pte_init(pte);
> >>>>>>>>>> +       return pte;
> >>>>>>>>>> +}
> >>>>>>>>>> +
> >>>>>>>>>>       static inline void pmd_populate_kernel(struct mm_struct *mm,
> >>>>>>>>>>                                             pmd_t *pmd, pte_t *pte)
> >>>>>>>>>>       {
> >>>>>>>>>> diff --git a/arch/loongarch/include/asm/pgtable.h b/arch/loongarch/include/asm/pgtable.h
> >>>>>>>>>> index 9965f52ef65b..22e3a8f96213 100644
> >>>>>>>>>> --- a/arch/loongarch/include/asm/pgtable.h
> >>>>>>>>>> +++ b/arch/loongarch/include/asm/pgtable.h
> >>>>>>>>>> @@ -269,6 +269,7 @@ extern void set_pmd_at(struct mm_struct *mm, unsigned long addr, pmd_t *pmdp, pm
> >>>>>>>>>>       extern void pgd_init(void *addr);
> >>>>>>>>>>       extern void pud_init(void *addr);
> >>>>>>>>>>       extern void pmd_init(void *addr);
> >>>>>>>>>> +extern void kernel_pte_init(void *addr);
> >>>>>>>>>>
> >>>>>>>>>>       /*
> >>>>>>>>>>        * Encode/decode swap entries and swap PTEs. Swap PTEs are all PTEs that
> >>>>>>>>>> diff --git a/arch/loongarch/mm/init.c b/arch/loongarch/mm/init.c
> >>>>>>>>>> index 8a87a482c8f4..9f26e933a8a3 100644
> >>>>>>>>>> --- a/arch/loongarch/mm/init.c
> >>>>>>>>>> +++ b/arch/loongarch/mm/init.c
> >>>>>>>>>> @@ -198,9 +198,11 @@ pte_t * __init populate_kernel_pte(unsigned long addr)
> >>>>>>>>>>              if (!pmd_present(pmdp_get(pmd))) {
> >>>>>>>>>>                      pte_t *pte;
> >>>>>>>>>>
> >>>>>>>>>> -               pte = memblock_alloc(PAGE_SIZE, PAGE_SIZE);
> >>>>>>>>>> +               pte = memblock_alloc_raw(PAGE_SIZE, PAGE_SIZE);
> >>>>>>>>>>                      if (!pte)
> >>>>>>>>>>                              panic("%s: Failed to allocate memory\n", __func__);
> >>>>>>>>>> +
> >>>>>>>>>> +               kernel_pte_init(pte);
> >>>>>>>>>>                      pmd_populate_kernel(&init_mm, pmd, pte);
> >>>>>>>>>>              }
> >>>>>>>>>>
> >>>>>>>>>> diff --git a/arch/loongarch/mm/kasan_init.c b/arch/loongarch/mm/kasan_init.c
> >>>>>>>>>> index 427d6b1aec09..34988573b0d5 100644
> >>>>>>>>>> --- a/arch/loongarch/mm/kasan_init.c
> >>>>>>>>>> +++ b/arch/loongarch/mm/kasan_init.c
> >>>>>>>>>> @@ -152,6 +152,8 @@ static void __init kasan_pte_populate(pmd_t *pmdp, unsigned long addr,
> >>>>>>>>>>                      phys_addr_t page_phys = early ?
> >>>>>>>>>>                                              __pa_symbol(kasan_early_shadow_page)
> >>>>>>>>>>                                                    : kasan_alloc_zeroed_page(node);
> >>>>>>>>>> +               if (!early)
> >>>>>>>>>> +                       kernel_pte_init(__va(page_phys));
> >>>>>>>>>>                      next = addr + PAGE_SIZE;
> >>>>>>>>>>                      set_pte(ptep, pfn_pte(__phys_to_pfn(page_phys), PAGE_KERNEL));
> >>>>>>>>>>              } while (ptep++, addr = next, addr != end && __pte_none(early, ptep_get(ptep)));
> >>>>>>>>>> @@ -287,7 +289,7 @@ void __init kasan_init(void)
> >>>>>>>>>>                      set_pte(&kasan_early_shadow_pte[i],
> >>>>>>>>>>                              pfn_pte(__phys_to_pfn(__pa_symbol(kasan_early_shadow_page)), PAGE_KERNEL_RO));
> >>>>>>>>>>
> >>>>>>>>>> -       memset(kasan_early_shadow_page, 0, PAGE_SIZE);
> >>>>>>>>>> +       kernel_pte_init(kasan_early_shadow_page);
> >>>>>>>>>>              csr_write64(__pa_symbol(swapper_pg_dir), LOONGARCH_CSR_PGDH);
> >>>>>>>>>>              local_flush_tlb_all();
> >>>>>>>>>>
> >>>>>>>>>> diff --git a/arch/loongarch/mm/pgtable.c b/arch/loongarch/mm/pgtable.c
> >>>>>>>>>> index eb6a29b491a7..228ffc1db0a3 100644
> >>>>>>>>>> --- a/arch/loongarch/mm/pgtable.c
> >>>>>>>>>> +++ b/arch/loongarch/mm/pgtable.c
> >>>>>>>>>> @@ -38,6 +38,28 @@ pgd_t *pgd_alloc(struct mm_struct *mm)
> >>>>>>>>>>       }
> >>>>>>>>>>       EXPORT_SYMBOL_GPL(pgd_alloc);
> >>>>>>>>>>
> >>>>>>>>>> +void kernel_pte_init(void *addr)
> >>>>>>>>>> +{
> >>>>>>>>>> +       unsigned long *p, *end;
> >>>>>>>>>> +       unsigned long entry;
> >>>>>>>>>> +
> >>>>>>>>>> +       entry = (unsigned long)_PAGE_GLOBAL;
> >>>>>>>>>> +       p = (unsigned long *)addr;
> >>>>>>>>>> +       end = p + PTRS_PER_PTE;
> >>>>>>>>>> +
> >>>>>>>>>> +       do {
> >>>>>>>>>> +               p[0] = entry;
> >>>>>>>>>> +               p[1] = entry;
> >>>>>>>>>> +               p[2] = entry;
> >>>>>>>>>> +               p[3] = entry;
> >>>>>>>>>> +               p[4] = entry;
> >>>>>>>>>> +               p += 8;
> >>>>>>>>>> +               p[-3] = entry;
> >>>>>>>>>> +               p[-2] = entry;
> >>>>>>>>>> +               p[-1] = entry;
> >>>>>>>>>> +       } while (p != end);
> >>>>>>>>>> +}
> >>>>>>>>>> +
> >>>>>>>>>>       void pgd_init(void *addr)
> >>>>>>>>>>       {
> >>>>>>>>>>              unsigned long *p, *end;
> >>>>>>>>>> diff --git a/include/linux/mm.h b/include/linux/mm.h
> >>>>>>>>>> index ecf63d2b0582..6909fe059a2c 100644
> >>>>>>>>>> --- a/include/linux/mm.h
> >>>>>>>>>> +++ b/include/linux/mm.h
> >>>>>>>>>> @@ -3818,6 +3818,7 @@ void *sparse_buffer_alloc(unsigned long size);
> >>>>>>>>>>       struct page * __populate_section_memmap(unsigned long pfn,
> >>>>>>>>>>                      unsigned long nr_pages, int nid, struct vmem_altmap *altmap,
> >>>>>>>>>>                      struct dev_pagemap *pgmap);
> >>>>>>>>>> +void kernel_pte_init(void *addr);
> >>>>>>>>>>       void pmd_init(void *addr);
> >>>>>>>>>>       void pud_init(void *addr);
> >>>>>>>>>>       pgd_t *vmemmap_pgd_populate(unsigned long addr, int node);
> >>>>>>>>>> diff --git a/mm/kasan/init.c b/mm/kasan/init.c
> >>>>>>>>>> index 89895f38f722..ac607c306292 100644
> >>>>>>>>>> --- a/mm/kasan/init.c
> >>>>>>>>>> +++ b/mm/kasan/init.c
> >>>>>>>>>> @@ -106,6 +106,10 @@ static void __ref zero_pte_populate(pmd_t *pmd, unsigned long addr,
> >>>>>>>>>>              }
> >>>>>>>>>>       }
> >>>>>>>>>>
> >>>>>>>>>> +void __weak __meminit kernel_pte_init(void *addr)
> >>>>>>>>>> +{
> >>>>>>>>>> +}
> >>>>>>>>>> +
> >>>>>>>>>>       static int __ref zero_pmd_populate(pud_t *pud, unsigned long addr,
> >>>>>>>>>>                                      unsigned long end)
> >>>>>>>>>>       {
> >>>>>>>>>> @@ -126,8 +130,10 @@ static int __ref zero_pmd_populate(pud_t *pud, unsigned long addr,
> >>>>>>>>>>
> >>>>>>>>>>                              if (slab_is_available())
> >>>>>>>>>>                                      p = pte_alloc_one_kernel(&init_mm);
> >>>>>>>>>> -                       else
> >>>>>>>>>> +                       else {
> >>>>>>>>>>                                      p = early_alloc(PAGE_SIZE, NUMA_NO_NODE);
> >>>>>>>>>> +                               kernel_pte_init(p);
> >>>>>>>>>> +                       }
> >>>>>>>>>>                              if (!p)
> >>>>>>>>>>                                      return -ENOMEM;
> >>>>>>>>>>
> >>>>>>>>>> diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
> >>>>>>>>>> index edcc7a6b0f6f..c0388b2e959d 100644
> >>>>>>>>>> --- a/mm/sparse-vmemmap.c
> >>>>>>>>>> +++ b/mm/sparse-vmemmap.c
> >>>>>>>>>> @@ -184,6 +184,10 @@ static void * __meminit vmemmap_alloc_block_zero(unsigned long size, int node)
> >>>>>>>>>>              return p;
> >>>>>>>>>>       }
> >>>>>>>>>>
> >>>>>>>>>> +void __weak __meminit kernel_pte_init(void *addr)
> >>>>>>>>>> +{
> >>>>>>>>>> +}
> >>>>>>>>>> +
> >>>>>>>>>>       pmd_t * __meminit vmemmap_pmd_populate(pud_t *pud, unsigned long addr, int node)
> >>>>>>>>>>       {
> >>>>>>>>>>              pmd_t *pmd = pmd_offset(pud, addr);
> >>>>>>>>>> @@ -191,6 +195,7 @@ pmd_t * __meminit vmemmap_pmd_populate(pud_t *pud, unsigned long addr, int node)
> >>>>>>>>>>                      void *p = vmemmap_alloc_block_zero(PAGE_SIZE, node);
> >>>>>>>>>>                      if (!p)
> >>>>>>>>>>                              return NULL;
> >>>>>>>>>> +               kernel_pte_init(p);
> >>>>>>>>>>                      pmd_populate_kernel(&init_mm, pmd, p);
> >>>>>>>>>>              }
> >>>>>>>>>>              return pmd;
> >>>>>>>>>> --
> >>>>>>>>>> 2.39.3
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>>>
> >>>>
> >>>>
> >>
>


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2024-10-22  1:57 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-10-14  3:58 [PATCH v2 0/3] LoongArch: Fix vmalloc test issue Bibo Mao
2024-10-14  3:58 ` [PATCH v2 1/3] LoongArch: Set initial pte entry with PAGE_GLOBAL for kernel space Bibo Mao
2024-10-18  3:14   ` Huacai Chen
2024-10-18  3:44     ` maobibo
2024-10-18  4:11       ` Huacai Chen
2024-10-18  4:16         ` maobibo
2024-10-18  4:23           ` Huacai Chen
2024-10-18  6:23             ` maobibo
2024-10-18  6:32               ` Huacai Chen
2024-10-21  1:22                 ` maobibo
2024-10-21 10:13                   ` Huacai Chen
2024-10-22  1:39                     ` maobibo
2024-10-22  1:56                       ` Huacai Chen
2024-10-14  3:58 ` [PATCH v2 2/3] LoongArch: Add barrier between set_pte and memory access Bibo Mao
2024-10-14  6:31   ` Huacai Chen
2024-10-15  2:53     ` maobibo
2024-10-15 12:27       ` Huacai Chen
2024-10-16  6:09         ` maobibo
2024-10-16  7:30           ` Huacai Chen
2024-10-14  3:58 ` [PATCH v2 3/3] LoongArch: Remove pte buddy set with set_pte and pte_clear function Bibo Mao
2024-10-14  6:33   ` Huacai Chen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox