* [PATCH 0/4] riscv: support fast gup
@ 2023-12-19 17:50 Jisheng Zhang
2023-12-19 17:50 ` [PATCH 1/4] riscv: tlb: fix __p*d_free_tlb() Jisheng Zhang
` (4 more replies)
0 siblings, 5 replies; 19+ messages in thread
From: Jisheng Zhang @ 2023-12-19 17:50 UTC (permalink / raw)
To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Will Deacon,
Aneesh Kumar K . V, Andrew Morton, Nick Piggin, Peter Zijlstra
Cc: linux-riscv, linux-kernel, linux-arch, linux-mm
This series adds fast gup support to riscv.
The First patch fixes a bug in __p*d_free_tlb(). Per the riscv
privileged spec, if non-leaf PTEs I.E pmd, pud or p4d is modified, a
sfence.vma is a must.
The 2nd patch is a preparation patch.
The last two patches do the real work:
In order to implement fast gup we need to ensure that the page
table walker is protected from page table pages being freed from
under it.
riscv situation is more complicated than other architectures: some
riscv platforms may use IPI to perform TLB shootdown, for example,
those platforms which support AIA, usually the riscv_ipi_for_rfence is
true on these platforms; some riscv platforms may rely on the SBI to
perform TLB shootdown, usually the riscv_ipi_for_rfence is false on
these platforms. To keep software pagetable walkers safe in this case
we switch to RCU based table free (MMU_GATHER_RCU_TABLE_FREE). See the
comment below 'ifdef CONFIG_MMU_GATHER_RCU_TABLE_FREE' in
include/asm-generic/tlb.h for more details.
This patch enables MMU_GATHER_RCU_TABLE_FREE, then use
*tlb_remove_page_ptdesc() for those platforms which use IPI to perform
TLB shootdown;
*tlb_remove_ptdesc() for those platforms which use SBI to perform TLB
shootdown;
Both case mean that disabling interrupts will block the free and
protect the fast gup page walker.
So after the 3rd patch, everything is well prepared, let's select
HAVE_FAST_GUP if MMU.
Jisheng Zhang (4):
riscv: tlb: fix __p*d_free_tlb()
riscv: tlb: convert __p*d_free_tlb() to inline functions
riscv: enable MMU_GATHER_RCU_TABLE_FREE for SMP && MMU
riscv: enable HAVE_FAST_GUP if MMU
arch/riscv/Kconfig | 2 ++
arch/riscv/include/asm/pgalloc.h | 53 +++++++++++++++++++++++++++-----
arch/riscv/include/asm/pgtable.h | 6 ++++
arch/riscv/include/asm/tlb.h | 18 +++++++++++
4 files changed, 71 insertions(+), 8 deletions(-)
--
2.40.0
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH 1/4] riscv: tlb: fix __p*d_free_tlb()
2023-12-19 17:50 [PATCH 0/4] riscv: support fast gup Jisheng Zhang
@ 2023-12-19 17:50 ` Jisheng Zhang
2023-12-31 6:21 ` Alexandre Ghiti
2024-01-04 10:55 ` Alexandre Ghiti
2023-12-19 17:50 ` [PATCH 2/4] riscv: tlb: convert __p*d_free_tlb() to inline functions Jisheng Zhang
` (3 subsequent siblings)
4 siblings, 2 replies; 19+ messages in thread
From: Jisheng Zhang @ 2023-12-19 17:50 UTC (permalink / raw)
To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Will Deacon,
Aneesh Kumar K . V, Andrew Morton, Nick Piggin, Peter Zijlstra
Cc: linux-riscv, linux-kernel, linux-arch, linux-mm
If non-leaf PTEs I.E pmd, pud or p4d is modified, a sfence.vma is
a must for safe, imagine if an implementation caches the non-leaf
translation in TLB, although I didn't meet this HW so far, but it's
possible in theory.
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
---
arch/riscv/include/asm/pgalloc.h | 20 +++++++++++++++++---
1 file changed, 17 insertions(+), 3 deletions(-)
diff --git a/arch/riscv/include/asm/pgalloc.h b/arch/riscv/include/asm/pgalloc.h
index d169a4f41a2e..a12fb83fa1f5 100644
--- a/arch/riscv/include/asm/pgalloc.h
+++ b/arch/riscv/include/asm/pgalloc.h
@@ -95,7 +95,13 @@ static inline void pud_free(struct mm_struct *mm, pud_t *pud)
__pud_free(mm, pud);
}
-#define __pud_free_tlb(tlb, pud, addr) pud_free((tlb)->mm, pud)
+#define __pud_free_tlb(tlb, pud, addr) \
+do { \
+ if (pgtable_l4_enabled) { \
+ pagetable_pud_dtor(virt_to_ptdesc(pud)); \
+ tlb_remove_page_ptdesc((tlb), virt_to_ptdesc(pud)); \
+ } \
+} while (0)
#define p4d_alloc_one p4d_alloc_one
static inline p4d_t *p4d_alloc_one(struct mm_struct *mm, unsigned long addr)
@@ -124,7 +130,11 @@ static inline void p4d_free(struct mm_struct *mm, p4d_t *p4d)
__p4d_free(mm, p4d);
}
-#define __p4d_free_tlb(tlb, p4d, addr) p4d_free((tlb)->mm, p4d)
+#define __p4d_free_tlb(tlb, p4d, addr) \
+do { \
+ if (pgtable_l5_enabled) \
+ tlb_remove_page_ptdesc((tlb), virt_to_ptdesc(p4d)); \
+} while (0)
#endif /* __PAGETABLE_PMD_FOLDED */
static inline void sync_kernel_mappings(pgd_t *pgd)
@@ -149,7 +159,11 @@ static inline pgd_t *pgd_alloc(struct mm_struct *mm)
#ifndef __PAGETABLE_PMD_FOLDED
-#define __pmd_free_tlb(tlb, pmd, addr) pmd_free((tlb)->mm, pmd)
+#define __pmd_free_tlb(tlb, pmd, addr) \
+do { \
+ pagetable_pmd_dtor(virt_to_ptdesc(pmd)); \
+ tlb_remove_page_ptdesc((tlb), virt_to_ptdesc(pmd)); \
+} while (0)
#endif /* __PAGETABLE_PMD_FOLDED */
--
2.40.0
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH 2/4] riscv: tlb: convert __p*d_free_tlb() to inline functions
2023-12-19 17:50 [PATCH 0/4] riscv: support fast gup Jisheng Zhang
2023-12-19 17:50 ` [PATCH 1/4] riscv: tlb: fix __p*d_free_tlb() Jisheng Zhang
@ 2023-12-19 17:50 ` Jisheng Zhang
2023-12-20 2:59 ` [External] " yunhui cui
2023-12-31 6:24 ` Alexandre Ghiti
2023-12-19 17:50 ` [PATCH 3/4] riscv: enable MMU_GATHER_RCU_TABLE_FREE for SMP && MMU Jisheng Zhang
` (2 subsequent siblings)
4 siblings, 2 replies; 19+ messages in thread
From: Jisheng Zhang @ 2023-12-19 17:50 UTC (permalink / raw)
To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Will Deacon,
Aneesh Kumar K . V, Andrew Morton, Nick Piggin, Peter Zijlstra
Cc: linux-riscv, linux-kernel, linux-arch, linux-mm
This is to prepare for enabling MMU_GATHER_RCU_TABLE_FREE.
No functionality changes.
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
---
arch/riscv/include/asm/pgalloc.h | 54 +++++++++++++++++++-------------
1 file changed, 32 insertions(+), 22 deletions(-)
diff --git a/arch/riscv/include/asm/pgalloc.h b/arch/riscv/include/asm/pgalloc.h
index a12fb83fa1f5..3c5e3bd15f46 100644
--- a/arch/riscv/include/asm/pgalloc.h
+++ b/arch/riscv/include/asm/pgalloc.h
@@ -95,13 +95,16 @@ static inline void pud_free(struct mm_struct *mm, pud_t *pud)
__pud_free(mm, pud);
}
-#define __pud_free_tlb(tlb, pud, addr) \
-do { \
- if (pgtable_l4_enabled) { \
- pagetable_pud_dtor(virt_to_ptdesc(pud)); \
- tlb_remove_page_ptdesc((tlb), virt_to_ptdesc(pud)); \
- } \
-} while (0)
+static inline void __pud_free_tlb(struct mmu_gather *tlb, pud_t *pud,
+ unsigned long addr)
+{
+ if (pgtable_l4_enabled) {
+ struct ptdesc *ptdesc = virt_to_ptdesc(pud);
+
+ pagetable_pud_dtor(ptdesc);
+ tlb_remove_page_ptdesc(tlb, ptdesc);
+ }
+}
#define p4d_alloc_one p4d_alloc_one
static inline p4d_t *p4d_alloc_one(struct mm_struct *mm, unsigned long addr)
@@ -130,11 +133,12 @@ static inline void p4d_free(struct mm_struct *mm, p4d_t *p4d)
__p4d_free(mm, p4d);
}
-#define __p4d_free_tlb(tlb, p4d, addr) \
-do { \
- if (pgtable_l5_enabled) \
- tlb_remove_page_ptdesc((tlb), virt_to_ptdesc(p4d)); \
-} while (0)
+static inline void __p4d_free_tlb(struct mmu_gather *tlb, p4d_t *p4d,
+ unsigned long addr)
+{
+ if (pgtable_l5_enabled)
+ tlb_remove_page_ptdesc(tlb, virt_to_ptdesc(p4d));
+}
#endif /* __PAGETABLE_PMD_FOLDED */
static inline void sync_kernel_mappings(pgd_t *pgd)
@@ -159,19 +163,25 @@ static inline pgd_t *pgd_alloc(struct mm_struct *mm)
#ifndef __PAGETABLE_PMD_FOLDED
-#define __pmd_free_tlb(tlb, pmd, addr) \
-do { \
- pagetable_pmd_dtor(virt_to_ptdesc(pmd)); \
- tlb_remove_page_ptdesc((tlb), virt_to_ptdesc(pmd)); \
-} while (0)
+static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd,
+ unsigned long addr)
+{
+ struct ptdesc *ptdesc = virt_to_ptdesc(pmd);
+
+ pagetable_pmd_dtor(ptdesc);
+ tlb_remove_page_ptdesc(tlb, ptdesc);
+}
#endif /* __PAGETABLE_PMD_FOLDED */
-#define __pte_free_tlb(tlb, pte, buf) \
-do { \
- pagetable_pte_dtor(page_ptdesc(pte)); \
- tlb_remove_page_ptdesc((tlb), page_ptdesc(pte));\
-} while (0)
+static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
+ unsigned long addr)
+{
+ struct ptdesc *ptdesc = page_ptdesc(pte);
+
+ pagetable_pte_dtor(ptdesc);
+ tlb_remove_page_ptdesc(tlb, ptdesc);
+}
#endif /* CONFIG_MMU */
#endif /* _ASM_RISCV_PGALLOC_H */
--
2.40.0
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH 3/4] riscv: enable MMU_GATHER_RCU_TABLE_FREE for SMP && MMU
2023-12-19 17:50 [PATCH 0/4] riscv: support fast gup Jisheng Zhang
2023-12-19 17:50 ` [PATCH 1/4] riscv: tlb: fix __p*d_free_tlb() Jisheng Zhang
2023-12-19 17:50 ` [PATCH 2/4] riscv: tlb: convert __p*d_free_tlb() to inline functions Jisheng Zhang
@ 2023-12-19 17:50 ` Jisheng Zhang
2023-12-31 6:32 ` Alexandre Ghiti
2023-12-19 17:50 ` [PATCH 4/4] riscv: enable HAVE_FAST_GUP if MMU Jisheng Zhang
2024-01-25 21:30 ` [PATCH 0/4] riscv: support fast gup patchwork-bot+linux-riscv
4 siblings, 1 reply; 19+ messages in thread
From: Jisheng Zhang @ 2023-12-19 17:50 UTC (permalink / raw)
To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Will Deacon,
Aneesh Kumar K . V, Andrew Morton, Nick Piggin, Peter Zijlstra
Cc: linux-riscv, linux-kernel, linux-arch, linux-mm
In order to implement fast gup we need to ensure that the page
table walker is protected from page table pages being freed from
under it.
riscv situation is more complicated than other architectures: some
riscv platforms may use IPI to perform TLB shootdown, for example,
those platforms which support AIA, usually the riscv_ipi_for_rfence is
true on these platforms; some riscv platforms may rely on the SBI to
perform TLB shootdown, usually the riscv_ipi_for_rfence is false on
these platforms. To keep software pagetable walkers safe in this case
we switch to RCU based table free (MMU_GATHER_RCU_TABLE_FREE). See the
comment below 'ifdef CONFIG_MMU_GATHER_RCU_TABLE_FREE' in
include/asm-generic/tlb.h for more details.
This patch enables MMU_GATHER_RCU_TABLE_FREE, then use
*tlb_remove_page_ptdesc() for those platforms which use IPI to perform
TLB shootdown;
*tlb_remove_ptdesc() for those platforms which use SBI to perform TLB
shootdown;
Both case mean that disabling interrupts will block the free and
protect the fast gup page walker.
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
---
arch/riscv/Kconfig | 1 +
arch/riscv/include/asm/pgalloc.h | 23 ++++++++++++++++++-----
arch/riscv/include/asm/tlb.h | 18 ++++++++++++++++++
3 files changed, 37 insertions(+), 5 deletions(-)
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 24c1799e2ec4..d3555173d9f4 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -147,6 +147,7 @@ config RISCV
select IRQ_FORCED_THREADING
select KASAN_VMALLOC if KASAN
select LOCK_MM_AND_FIND_VMA
+ select MMU_GATHER_RCU_TABLE_FREE if SMP && MMU
select MODULES_USE_ELF_RELA if MODULES
select MODULE_SECTIONS if MODULES
select OF
diff --git a/arch/riscv/include/asm/pgalloc.h b/arch/riscv/include/asm/pgalloc.h
index 3c5e3bd15f46..deaf971253a2 100644
--- a/arch/riscv/include/asm/pgalloc.h
+++ b/arch/riscv/include/asm/pgalloc.h
@@ -102,7 +102,10 @@ static inline void __pud_free_tlb(struct mmu_gather *tlb, pud_t *pud,
struct ptdesc *ptdesc = virt_to_ptdesc(pud);
pagetable_pud_dtor(ptdesc);
- tlb_remove_page_ptdesc(tlb, ptdesc);
+ if (riscv_use_ipi_for_rfence())
+ tlb_remove_page_ptdesc(tlb, ptdesc);
+ else
+ tlb_remove_ptdesc(tlb, ptdesc);
}
}
@@ -136,8 +139,12 @@ static inline void p4d_free(struct mm_struct *mm, p4d_t *p4d)
static inline void __p4d_free_tlb(struct mmu_gather *tlb, p4d_t *p4d,
unsigned long addr)
{
- if (pgtable_l5_enabled)
- tlb_remove_page_ptdesc(tlb, virt_to_ptdesc(p4d));
+ if (pgtable_l5_enabled) {
+ if (riscv_use_ipi_for_rfence())
+ tlb_remove_page_ptdesc(tlb, virt_to_ptdesc(p4d));
+ else
+ tlb_remove_ptdesc(tlb, virt_to_ptdesc(p4d));
+ }
}
#endif /* __PAGETABLE_PMD_FOLDED */
@@ -169,7 +176,10 @@ static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd,
struct ptdesc *ptdesc = virt_to_ptdesc(pmd);
pagetable_pmd_dtor(ptdesc);
- tlb_remove_page_ptdesc(tlb, ptdesc);
+ if (riscv_use_ipi_for_rfence())
+ tlb_remove_page_ptdesc(tlb, ptdesc);
+ else
+ tlb_remove_ptdesc(tlb, ptdesc);
}
#endif /* __PAGETABLE_PMD_FOLDED */
@@ -180,7 +190,10 @@ static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
struct ptdesc *ptdesc = page_ptdesc(pte);
pagetable_pte_dtor(ptdesc);
- tlb_remove_page_ptdesc(tlb, ptdesc);
+ if (riscv_use_ipi_for_rfence())
+ tlb_remove_page_ptdesc(tlb, ptdesc);
+ else
+ tlb_remove_ptdesc(tlb, ptdesc);
}
#endif /* CONFIG_MMU */
diff --git a/arch/riscv/include/asm/tlb.h b/arch/riscv/include/asm/tlb.h
index 1eb5682b2af6..a0b8b853503f 100644
--- a/arch/riscv/include/asm/tlb.h
+++ b/arch/riscv/include/asm/tlb.h
@@ -10,6 +10,24 @@ struct mmu_gather;
static void tlb_flush(struct mmu_gather *tlb);
+#ifdef CONFIG_MMU
+#include <linux/swap.h>
+
+/*
+ * While riscv platforms with riscv_ipi_for_rfence as true require an IPI to
+ * perform TLB shootdown, some platforms with riscv_ipi_for_rfence as false use
+ * SBI to perform TLB shootdown. To keep software pagetable walkers safe in this
+ * case we switch to RCU based table free (MMU_GATHER_RCU_TABLE_FREE). See the
+ * comment below 'ifdef CONFIG_MMU_GATHER_RCU_TABLE_FREE' in include/asm-generic/tlb.h
+ * for more details.
+ */
+static inline void __tlb_remove_table(void *table)
+{
+ free_page_and_swap_cache(table);
+}
+
+#endif /* CONFIG_MMU */
+
#define tlb_flush tlb_flush
#include <asm-generic/tlb.h>
--
2.40.0
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH 4/4] riscv: enable HAVE_FAST_GUP if MMU
2023-12-19 17:50 [PATCH 0/4] riscv: support fast gup Jisheng Zhang
` (2 preceding siblings ...)
2023-12-19 17:50 ` [PATCH 3/4] riscv: enable MMU_GATHER_RCU_TABLE_FREE for SMP && MMU Jisheng Zhang
@ 2023-12-19 17:50 ` Jisheng Zhang
2023-12-31 6:37 ` Alexandre Ghiti
2024-01-25 21:30 ` [PATCH 0/4] riscv: support fast gup patchwork-bot+linux-riscv
4 siblings, 1 reply; 19+ messages in thread
From: Jisheng Zhang @ 2023-12-19 17:50 UTC (permalink / raw)
To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Will Deacon,
Aneesh Kumar K . V, Andrew Morton, Nick Piggin, Peter Zijlstra
Cc: linux-riscv, linux-kernel, linux-arch, linux-mm
Activate the fast gup for riscv mmu platforms. Here are some
GUP_FAST_BENCHMARK performance numbers:
Before the patch:
GUP_FAST_BENCHMARK: Time: get:53203 put:5085 us
After the patch:
GUP_FAST_BENCHMARK: Time: get:17711 put:5060 us
The get time is reduced by 66.7%! IOW, 3x get speed!
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
---
arch/riscv/Kconfig | 1 +
arch/riscv/include/asm/pgtable.h | 6 ++++++
2 files changed, 7 insertions(+)
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index d3555173d9f4..04df9920282d 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -119,6 +119,7 @@ config RISCV
select HAVE_FUNCTION_GRAPH_RETVAL if HAVE_FUNCTION_GRAPH_TRACER
select HAVE_FUNCTION_TRACER if !XIP_KERNEL && !PREEMPTION
select HAVE_EBPF_JIT if MMU
+ select HAVE_FAST_GUP if MMU
select HAVE_FUNCTION_ARG_ACCESS_API
select HAVE_FUNCTION_ERROR_INJECTION
select HAVE_GCC_PLUGINS
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index ab00235b018f..c6eb214139e6 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -673,6 +673,12 @@ static inline int pmd_write(pmd_t pmd)
return pte_write(pmd_pte(pmd));
}
+#define pud_write pud_write
+static inline int pud_write(pud_t pud)
+{
+ return pte_write(pud_pte(pud));
+}
+
static inline int pmd_dirty(pmd_t pmd)
{
return pte_dirty(pmd_pte(pmd));
--
2.40.0
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [External] [PATCH 2/4] riscv: tlb: convert __p*d_free_tlb() to inline functions
2023-12-19 17:50 ` [PATCH 2/4] riscv: tlb: convert __p*d_free_tlb() to inline functions Jisheng Zhang
@ 2023-12-20 2:59 ` yunhui cui
2023-12-20 12:57 ` Jisheng Zhang
2023-12-31 6:24 ` Alexandre Ghiti
1 sibling, 1 reply; 19+ messages in thread
From: yunhui cui @ 2023-12-20 2:59 UTC (permalink / raw)
To: Jisheng Zhang
Cc: Paul Walmsley, Palmer Dabbelt, Albert Ou, Will Deacon,
Aneesh Kumar K . V, Andrew Morton, Nick Piggin, Peter Zijlstra,
linux-riscv, linux-kernel, linux-arch, linux-mm
Hi Jisheng,
On Wed, Dec 20, 2023 at 2:04 AM Jisheng Zhang <jszhang@kernel.org> wrote:
>
> This is to prepare for enabling MMU_GATHER_RCU_TABLE_FREE.
> No functionality changes.
>
> Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
> ---
> arch/riscv/include/asm/pgalloc.h | 54 +++++++++++++++++++-------------
> 1 file changed, 32 insertions(+), 22 deletions(-)
>
> diff --git a/arch/riscv/include/asm/pgalloc.h b/arch/riscv/include/asm/pgalloc.h
> index a12fb83fa1f5..3c5e3bd15f46 100644
> --- a/arch/riscv/include/asm/pgalloc.h
> +++ b/arch/riscv/include/asm/pgalloc.h
> @@ -95,13 +95,16 @@ static inline void pud_free(struct mm_struct *mm, pud_t *pud)
> __pud_free(mm, pud);
> }
>
> -#define __pud_free_tlb(tlb, pud, addr) \
> -do { \
> - if (pgtable_l4_enabled) { \
> - pagetable_pud_dtor(virt_to_ptdesc(pud)); \
> - tlb_remove_page_ptdesc((tlb), virt_to_ptdesc(pud)); \
> - } \
> -} while (0)
> +static inline void __pud_free_tlb(struct mmu_gather *tlb, pud_t *pud,
> + unsigned long addr)
> +{
> + if (pgtable_l4_enabled) {
> + struct ptdesc *ptdesc = virt_to_ptdesc(pud);
> +
> + pagetable_pud_dtor(ptdesc);
> + tlb_remove_page_ptdesc(tlb, ptdesc);
> + }
> +}
>
> #define p4d_alloc_one p4d_alloc_one
> static inline p4d_t *p4d_alloc_one(struct mm_struct *mm, unsigned long addr)
> @@ -130,11 +133,12 @@ static inline void p4d_free(struct mm_struct *mm, p4d_t *p4d)
> __p4d_free(mm, p4d);
> }
>
> -#define __p4d_free_tlb(tlb, p4d, addr) \
> -do { \
> - if (pgtable_l5_enabled) \
> - tlb_remove_page_ptdesc((tlb), virt_to_ptdesc(p4d)); \
> -} while (0)
> +static inline void __p4d_free_tlb(struct mmu_gather *tlb, p4d_t *p4d,
> + unsigned long addr)
> +{
> + if (pgtable_l5_enabled)
> + tlb_remove_page_ptdesc(tlb, virt_to_ptdesc(p4d));
> +}
> #endif /* __PAGETABLE_PMD_FOLDED */
>
> static inline void sync_kernel_mappings(pgd_t *pgd)
> @@ -159,19 +163,25 @@ static inline pgd_t *pgd_alloc(struct mm_struct *mm)
>
> #ifndef __PAGETABLE_PMD_FOLDED
>
> -#define __pmd_free_tlb(tlb, pmd, addr) \
> -do { \
> - pagetable_pmd_dtor(virt_to_ptdesc(pmd)); \
> - tlb_remove_page_ptdesc((tlb), virt_to_ptdesc(pmd)); \
> -} while (0)
> +static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd,
> + unsigned long addr)
> +{
> + struct ptdesc *ptdesc = virt_to_ptdesc(pmd);
> +
> + pagetable_pmd_dtor(ptdesc);
> + tlb_remove_page_ptdesc(tlb, ptdesc);
> +}
>
> #endif /* __PAGETABLE_PMD_FOLDED */
>
> -#define __pte_free_tlb(tlb, pte, buf) \
> -do { \
> - pagetable_pte_dtor(page_ptdesc(pte)); \
> - tlb_remove_page_ptdesc((tlb), page_ptdesc(pte));\
> -} while (0)
> +static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
> + unsigned long addr)
> +{
> + struct ptdesc *ptdesc = page_ptdesc(pte);
> +
> + pagetable_pte_dtor(ptdesc);
> + tlb_remove_page_ptdesc(tlb, ptdesc);
> +}
> #endif /* CONFIG_MMU */
>
> #endif /* _ASM_RISCV_PGALLOC_H */
> --
> 2.40.0
>
Why is it necessary to convert to inline functions?
Thanks,
Yunhui
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [External] [PATCH 2/4] riscv: tlb: convert __p*d_free_tlb() to inline functions
2023-12-20 2:59 ` [External] " yunhui cui
@ 2023-12-20 12:57 ` Jisheng Zhang
0 siblings, 0 replies; 19+ messages in thread
From: Jisheng Zhang @ 2023-12-20 12:57 UTC (permalink / raw)
To: yunhui cui
Cc: Paul Walmsley, Palmer Dabbelt, Albert Ou, Will Deacon,
Aneesh Kumar K . V, Andrew Morton, Nick Piggin, Peter Zijlstra,
linux-riscv, linux-kernel, linux-arch, linux-mm
On Wed, Dec 20, 2023 at 10:59:22AM +0800, yunhui cui wrote:
> Hi Jisheng,
Hi,
>
> On Wed, Dec 20, 2023 at 2:04 AM Jisheng Zhang <jszhang@kernel.org> wrote:
> >
> > This is to prepare for enabling MMU_GATHER_RCU_TABLE_FREE.
> > No functionality changes.
> >
> > Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
> > ---
> > arch/riscv/include/asm/pgalloc.h | 54 +++++++++++++++++++-------------
> > 1 file changed, 32 insertions(+), 22 deletions(-)
> >
> > diff --git a/arch/riscv/include/asm/pgalloc.h b/arch/riscv/include/asm/pgalloc.h
> > index a12fb83fa1f5..3c5e3bd15f46 100644
> > --- a/arch/riscv/include/asm/pgalloc.h
> > +++ b/arch/riscv/include/asm/pgalloc.h
> > @@ -95,13 +95,16 @@ static inline void pud_free(struct mm_struct *mm, pud_t *pud)
> > __pud_free(mm, pud);
> > }
> >
> > -#define __pud_free_tlb(tlb, pud, addr) \
> > -do { \
> > - if (pgtable_l4_enabled) { \
> > - pagetable_pud_dtor(virt_to_ptdesc(pud)); \
> > - tlb_remove_page_ptdesc((tlb), virt_to_ptdesc(pud)); \
> > - } \
> > -} while (0)
> > +static inline void __pud_free_tlb(struct mmu_gather *tlb, pud_t *pud,
> > + unsigned long addr)
> > +{
> > + if (pgtable_l4_enabled) {
> > + struct ptdesc *ptdesc = virt_to_ptdesc(pud);
> > +
> > + pagetable_pud_dtor(ptdesc);
> > + tlb_remove_page_ptdesc(tlb, ptdesc);
> > + }
> > +}
> >
> > #define p4d_alloc_one p4d_alloc_one
> > static inline p4d_t *p4d_alloc_one(struct mm_struct *mm, unsigned long addr)
> > @@ -130,11 +133,12 @@ static inline void p4d_free(struct mm_struct *mm, p4d_t *p4d)
> > __p4d_free(mm, p4d);
> > }
> >
> > -#define __p4d_free_tlb(tlb, p4d, addr) \
> > -do { \
> > - if (pgtable_l5_enabled) \
> > - tlb_remove_page_ptdesc((tlb), virt_to_ptdesc(p4d)); \
> > -} while (0)
> > +static inline void __p4d_free_tlb(struct mmu_gather *tlb, p4d_t *p4d,
> > + unsigned long addr)
> > +{
> > + if (pgtable_l5_enabled)
> > + tlb_remove_page_ptdesc(tlb, virt_to_ptdesc(p4d));
> > +}
> > #endif /* __PAGETABLE_PMD_FOLDED */
> >
> > static inline void sync_kernel_mappings(pgd_t *pgd)
> > @@ -159,19 +163,25 @@ static inline pgd_t *pgd_alloc(struct mm_struct *mm)
> >
> > #ifndef __PAGETABLE_PMD_FOLDED
> >
> > -#define __pmd_free_tlb(tlb, pmd, addr) \
> > -do { \
> > - pagetable_pmd_dtor(virt_to_ptdesc(pmd)); \
> > - tlb_remove_page_ptdesc((tlb), virt_to_ptdesc(pmd)); \
> > -} while (0)
> > +static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd,
> > + unsigned long addr)
> > +{
> > + struct ptdesc *ptdesc = virt_to_ptdesc(pmd);
> > +
> > + pagetable_pmd_dtor(ptdesc);
> > + tlb_remove_page_ptdesc(tlb, ptdesc);
> > +}
> >
> > #endif /* __PAGETABLE_PMD_FOLDED */
> >
> > -#define __pte_free_tlb(tlb, pte, buf) \
> > -do { \
> > - pagetable_pte_dtor(page_ptdesc(pte)); \
> > - tlb_remove_page_ptdesc((tlb), page_ptdesc(pte));\
> > -} while (0)
> > +static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
> > + unsigned long addr)
> > +{
> > + struct ptdesc *ptdesc = page_ptdesc(pte);
> > +
> > + pagetable_pte_dtor(ptdesc);
> > + tlb_remove_page_ptdesc(tlb, ptdesc);
> > +}
> > #endif /* CONFIG_MMU */
> >
> > #endif /* _ASM_RISCV_PGALLOC_H */
> > --
> > 2.40.0
> >
>
> Why is it necessary to convert to inline functions?
Hmm, it's not necessary but a plus, the inline version's readability and
maintainability is better than macros
Regards
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/4] riscv: tlb: fix __p*d_free_tlb()
2023-12-19 17:50 ` [PATCH 1/4] riscv: tlb: fix __p*d_free_tlb() Jisheng Zhang
@ 2023-12-31 6:21 ` Alexandre Ghiti
2024-02-07 7:28 ` Alexandre Ghiti
2024-01-04 10:55 ` Alexandre Ghiti
1 sibling, 1 reply; 19+ messages in thread
From: Alexandre Ghiti @ 2023-12-31 6:21 UTC (permalink / raw)
To: Jisheng Zhang, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Will Deacon, Aneesh Kumar K . V, Andrew Morton, Nick Piggin,
Peter Zijlstra
Cc: linux-riscv, linux-kernel, linux-arch, linux-mm
Hi Jisheng,
On 19/12/2023 18:50, Jisheng Zhang wrote:
> If non-leaf PTEs I.E pmd, pud or p4d is modified, a sfence.vma is
> a must for safe, imagine if an implementation caches the non-leaf
> translation in TLB, although I didn't meet this HW so far, but it's
> possible in theory.
>
> Signed-off-by: Jisheng Zhang<jszhang@kernel.org>
> ---
> arch/riscv/include/asm/pgalloc.h | 20 +++++++++++++++++---
> 1 file changed, 17 insertions(+), 3 deletions(-)
>
> diff --git a/arch/riscv/include/asm/pgalloc.h b/arch/riscv/include/asm/pgalloc.h
> index d169a4f41a2e..a12fb83fa1f5 100644
> --- a/arch/riscv/include/asm/pgalloc.h
> +++ b/arch/riscv/include/asm/pgalloc.h
> @@ -95,7 +95,13 @@ static inline void pud_free(struct mm_struct *mm, pud_t *pud)
> __pud_free(mm, pud);
> }
>
> -#define __pud_free_tlb(tlb, pud, addr) pud_free((tlb)->mm, pud)
> +#define __pud_free_tlb(tlb, pud, addr) \
> +do { \
> + if (pgtable_l4_enabled) { \
> + pagetable_pud_dtor(virt_to_ptdesc(pud)); \
> + tlb_remove_page_ptdesc((tlb), virt_to_ptdesc(pud)); \
The specification indeed states that an sfence.vma must be emitted after
a page directory modification. Your change is not enough though since
eventually tlb_flush() is called and in this function we should add:
if (tlb->freed_tables)
tlb_flush_mm();
otherwise we are not guaranteed that a "global" sfence.vma is called.
Would you be able to benchmark this change and see the performance impact?
Thanks,
Alex
> + } \
> +} while (0)
>
> #define p4d_alloc_one p4d_alloc_one
> static inline p4d_t *p4d_alloc_one(struct mm_struct *mm, unsigned long addr)
> @@ -124,7 +130,11 @@ static inline void p4d_free(struct mm_struct *mm, p4d_t *p4d)
> __p4d_free(mm, p4d);
> }
>
> -#define __p4d_free_tlb(tlb, p4d, addr) p4d_free((tlb)->mm, p4d)
> +#define __p4d_free_tlb(tlb, p4d, addr) \
> +do { \
> + if (pgtable_l5_enabled) \
> + tlb_remove_page_ptdesc((tlb), virt_to_ptdesc(p4d)); \
> +} while (0)
> #endif /* __PAGETABLE_PMD_FOLDED */
>
> static inline void sync_kernel_mappings(pgd_t *pgd)
> @@ -149,7 +159,11 @@ static inline pgd_t *pgd_alloc(struct mm_struct *mm)
>
> #ifndef __PAGETABLE_PMD_FOLDED
>
> -#define __pmd_free_tlb(tlb, pmd, addr) pmd_free((tlb)->mm, pmd)
> +#define __pmd_free_tlb(tlb, pmd, addr) \
> +do { \
> + pagetable_pmd_dtor(virt_to_ptdesc(pmd)); \
> + tlb_remove_page_ptdesc((tlb), virt_to_ptdesc(pmd)); \
> +} while (0)
>
> #endif /* __PAGETABLE_PMD_FOLDED */
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2/4] riscv: tlb: convert __p*d_free_tlb() to inline functions
2023-12-19 17:50 ` [PATCH 2/4] riscv: tlb: convert __p*d_free_tlb() to inline functions Jisheng Zhang
2023-12-20 2:59 ` [External] " yunhui cui
@ 2023-12-31 6:24 ` Alexandre Ghiti
1 sibling, 0 replies; 19+ messages in thread
From: Alexandre Ghiti @ 2023-12-31 6:24 UTC (permalink / raw)
To: Jisheng Zhang, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Will Deacon, Aneesh Kumar K . V, Andrew Morton, Nick Piggin,
Peter Zijlstra
Cc: linux-riscv, linux-kernel, linux-arch, linux-mm
On 19/12/2023 18:50, Jisheng Zhang wrote:
> This is to prepare for enabling MMU_GATHER_RCU_TABLE_FREE.
> No functionality changes.
>
> Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
> ---
> arch/riscv/include/asm/pgalloc.h | 54 +++++++++++++++++++-------------
> 1 file changed, 32 insertions(+), 22 deletions(-)
>
> diff --git a/arch/riscv/include/asm/pgalloc.h b/arch/riscv/include/asm/pgalloc.h
> index a12fb83fa1f5..3c5e3bd15f46 100644
> --- a/arch/riscv/include/asm/pgalloc.h
> +++ b/arch/riscv/include/asm/pgalloc.h
> @@ -95,13 +95,16 @@ static inline void pud_free(struct mm_struct *mm, pud_t *pud)
> __pud_free(mm, pud);
> }
>
> -#define __pud_free_tlb(tlb, pud, addr) \
> -do { \
> - if (pgtable_l4_enabled) { \
> - pagetable_pud_dtor(virt_to_ptdesc(pud)); \
> - tlb_remove_page_ptdesc((tlb), virt_to_ptdesc(pud)); \
> - } \
> -} while (0)
> +static inline void __pud_free_tlb(struct mmu_gather *tlb, pud_t *pud,
> + unsigned long addr)
> +{
> + if (pgtable_l4_enabled) {
> + struct ptdesc *ptdesc = virt_to_ptdesc(pud);
> +
> + pagetable_pud_dtor(ptdesc);
> + tlb_remove_page_ptdesc(tlb, ptdesc);
> + }
> +}
>
> #define p4d_alloc_one p4d_alloc_one
> static inline p4d_t *p4d_alloc_one(struct mm_struct *mm, unsigned long addr)
> @@ -130,11 +133,12 @@ static inline void p4d_free(struct mm_struct *mm, p4d_t *p4d)
> __p4d_free(mm, p4d);
> }
>
> -#define __p4d_free_tlb(tlb, p4d, addr) \
> -do { \
> - if (pgtable_l5_enabled) \
> - tlb_remove_page_ptdesc((tlb), virt_to_ptdesc(p4d)); \
> -} while (0)
> +static inline void __p4d_free_tlb(struct mmu_gather *tlb, p4d_t *p4d,
> + unsigned long addr)
> +{
> + if (pgtable_l5_enabled)
> + tlb_remove_page_ptdesc(tlb, virt_to_ptdesc(p4d));
> +}
> #endif /* __PAGETABLE_PMD_FOLDED */
>
> static inline void sync_kernel_mappings(pgd_t *pgd)
> @@ -159,19 +163,25 @@ static inline pgd_t *pgd_alloc(struct mm_struct *mm)
>
> #ifndef __PAGETABLE_PMD_FOLDED
>
> -#define __pmd_free_tlb(tlb, pmd, addr) \
> -do { \
> - pagetable_pmd_dtor(virt_to_ptdesc(pmd)); \
> - tlb_remove_page_ptdesc((tlb), virt_to_ptdesc(pmd)); \
> -} while (0)
> +static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd,
> + unsigned long addr)
> +{
> + struct ptdesc *ptdesc = virt_to_ptdesc(pmd);
> +
> + pagetable_pmd_dtor(ptdesc);
> + tlb_remove_page_ptdesc(tlb, ptdesc);
> +}
>
> #endif /* __PAGETABLE_PMD_FOLDED */
>
> -#define __pte_free_tlb(tlb, pte, buf) \
> -do { \
> - pagetable_pte_dtor(page_ptdesc(pte)); \
> - tlb_remove_page_ptdesc((tlb), page_ptdesc(pte));\
> -} while (0)
> +static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
> + unsigned long addr)
> +{
> + struct ptdesc *ptdesc = page_ptdesc(pte);
> +
> + pagetable_pte_dtor(ptdesc);
> + tlb_remove_page_ptdesc(tlb, ptdesc);
> +}
> #endif /* CONFIG_MMU */
>
> #endif /* _ASM_RISCV_PGALLOC_H */
You can add:
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Thanks,
Alex
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 3/4] riscv: enable MMU_GATHER_RCU_TABLE_FREE for SMP && MMU
2023-12-19 17:50 ` [PATCH 3/4] riscv: enable MMU_GATHER_RCU_TABLE_FREE for SMP && MMU Jisheng Zhang
@ 2023-12-31 6:32 ` Alexandre Ghiti
2024-01-02 3:23 ` Jisheng Zhang
0 siblings, 1 reply; 19+ messages in thread
From: Alexandre Ghiti @ 2023-12-31 6:32 UTC (permalink / raw)
To: Jisheng Zhang, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Will Deacon, Aneesh Kumar K . V, Andrew Morton, Nick Piggin,
Peter Zijlstra
Cc: linux-riscv, linux-kernel, linux-arch, linux-mm
On 19/12/2023 18:50, Jisheng Zhang wrote:
> In order to implement fast gup we need to ensure that the page
> table walker is protected from page table pages being freed from
> under it.
>
> riscv situation is more complicated than other architectures: some
> riscv platforms may use IPI to perform TLB shootdown, for example,
> those platforms which support AIA, usually the riscv_ipi_for_rfence is
> true on these platforms; some riscv platforms may rely on the SBI to
> perform TLB shootdown, usually the riscv_ipi_for_rfence is false on
> these platforms. To keep software pagetable walkers safe in this case
> we switch to RCU based table free (MMU_GATHER_RCU_TABLE_FREE). See the
> comment below 'ifdef CONFIG_MMU_GATHER_RCU_TABLE_FREE' in
> include/asm-generic/tlb.h for more details.
>
> This patch enables MMU_GATHER_RCU_TABLE_FREE, then use
>
> *tlb_remove_page_ptdesc() for those platforms which use IPI to perform
> TLB shootdown;
>
> *tlb_remove_ptdesc() for those platforms which use SBI to perform TLB
> shootdown;
Can you elaborate a bit more on what those functions do differently and
why we need to differentiate IPI vs SBI TLB shootdown? I don't
understand this.
Thanks,
Alex
> Both case mean that disabling interrupts will block the free and
> protect the fast gup page walker.
>
> Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
> ---
> arch/riscv/Kconfig | 1 +
> arch/riscv/include/asm/pgalloc.h | 23 ++++++++++++++++++-----
> arch/riscv/include/asm/tlb.h | 18 ++++++++++++++++++
> 3 files changed, 37 insertions(+), 5 deletions(-)
>
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index 24c1799e2ec4..d3555173d9f4 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -147,6 +147,7 @@ config RISCV
> select IRQ_FORCED_THREADING
> select KASAN_VMALLOC if KASAN
> select LOCK_MM_AND_FIND_VMA
> + select MMU_GATHER_RCU_TABLE_FREE if SMP && MMU
> select MODULES_USE_ELF_RELA if MODULES
> select MODULE_SECTIONS if MODULES
> select OF
> diff --git a/arch/riscv/include/asm/pgalloc.h b/arch/riscv/include/asm/pgalloc.h
> index 3c5e3bd15f46..deaf971253a2 100644
> --- a/arch/riscv/include/asm/pgalloc.h
> +++ b/arch/riscv/include/asm/pgalloc.h
> @@ -102,7 +102,10 @@ static inline void __pud_free_tlb(struct mmu_gather *tlb, pud_t *pud,
> struct ptdesc *ptdesc = virt_to_ptdesc(pud);
>
> pagetable_pud_dtor(ptdesc);
> - tlb_remove_page_ptdesc(tlb, ptdesc);
> + if (riscv_use_ipi_for_rfence())
> + tlb_remove_page_ptdesc(tlb, ptdesc);
> + else
> + tlb_remove_ptdesc(tlb, ptdesc);
> }
> }
>
> @@ -136,8 +139,12 @@ static inline void p4d_free(struct mm_struct *mm, p4d_t *p4d)
> static inline void __p4d_free_tlb(struct mmu_gather *tlb, p4d_t *p4d,
> unsigned long addr)
> {
> - if (pgtable_l5_enabled)
> - tlb_remove_page_ptdesc(tlb, virt_to_ptdesc(p4d));
> + if (pgtable_l5_enabled) {
> + if (riscv_use_ipi_for_rfence())
> + tlb_remove_page_ptdesc(tlb, virt_to_ptdesc(p4d));
> + else
> + tlb_remove_ptdesc(tlb, virt_to_ptdesc(p4d));
> + }
> }
> #endif /* __PAGETABLE_PMD_FOLDED */
>
> @@ -169,7 +176,10 @@ static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd,
> struct ptdesc *ptdesc = virt_to_ptdesc(pmd);
>
> pagetable_pmd_dtor(ptdesc);
> - tlb_remove_page_ptdesc(tlb, ptdesc);
> + if (riscv_use_ipi_for_rfence())
> + tlb_remove_page_ptdesc(tlb, ptdesc);
> + else
> + tlb_remove_ptdesc(tlb, ptdesc);
> }
>
> #endif /* __PAGETABLE_PMD_FOLDED */
> @@ -180,7 +190,10 @@ static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
> struct ptdesc *ptdesc = page_ptdesc(pte);
>
> pagetable_pte_dtor(ptdesc);
> - tlb_remove_page_ptdesc(tlb, ptdesc);
> + if (riscv_use_ipi_for_rfence())
> + tlb_remove_page_ptdesc(tlb, ptdesc);
> + else
> + tlb_remove_ptdesc(tlb, ptdesc);
> }
> #endif /* CONFIG_MMU */
>
> diff --git a/arch/riscv/include/asm/tlb.h b/arch/riscv/include/asm/tlb.h
> index 1eb5682b2af6..a0b8b853503f 100644
> --- a/arch/riscv/include/asm/tlb.h
> +++ b/arch/riscv/include/asm/tlb.h
> @@ -10,6 +10,24 @@ struct mmu_gather;
>
> static void tlb_flush(struct mmu_gather *tlb);
>
> +#ifdef CONFIG_MMU
> +#include <linux/swap.h>
> +
> +/*
> + * While riscv platforms with riscv_ipi_for_rfence as true require an IPI to
> + * perform TLB shootdown, some platforms with riscv_ipi_for_rfence as false use
> + * SBI to perform TLB shootdown. To keep software pagetable walkers safe in this
> + * case we switch to RCU based table free (MMU_GATHER_RCU_TABLE_FREE). See the
> + * comment below 'ifdef CONFIG_MMU_GATHER_RCU_TABLE_FREE' in include/asm-generic/tlb.h
> + * for more details.
> + */
> +static inline void __tlb_remove_table(void *table)
> +{
> + free_page_and_swap_cache(table);
> +}
> +
> +#endif /* CONFIG_MMU */
> +
> #define tlb_flush tlb_flush
> #include <asm-generic/tlb.h>
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 4/4] riscv: enable HAVE_FAST_GUP if MMU
2023-12-19 17:50 ` [PATCH 4/4] riscv: enable HAVE_FAST_GUP if MMU Jisheng Zhang
@ 2023-12-31 6:37 ` Alexandre Ghiti
2024-01-02 3:25 ` Jisheng Zhang
0 siblings, 1 reply; 19+ messages in thread
From: Alexandre Ghiti @ 2023-12-31 6:37 UTC (permalink / raw)
To: Jisheng Zhang, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Will Deacon, Aneesh Kumar K . V, Andrew Morton, Nick Piggin,
Peter Zijlstra
Cc: linux-riscv, linux-kernel, linux-arch, linux-mm
On 19/12/2023 18:50, Jisheng Zhang wrote:
> Activate the fast gup for riscv mmu platforms. Here are some
> GUP_FAST_BENCHMARK performance numbers:
>
> Before the patch:
> GUP_FAST_BENCHMARK: Time: get:53203 put:5085 us
>
> After the patch:
> GUP_FAST_BENCHMARK: Time: get:17711 put:5060 us
On which platform did you run this benchmark?
>
> The get time is reduced by 66.7%! IOW, 3x get speed!
Well done!
Thanks,
Alex
>
> Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
> ---
> arch/riscv/Kconfig | 1 +
> arch/riscv/include/asm/pgtable.h | 6 ++++++
> 2 files changed, 7 insertions(+)
>
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index d3555173d9f4..04df9920282d 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -119,6 +119,7 @@ config RISCV
> select HAVE_FUNCTION_GRAPH_RETVAL if HAVE_FUNCTION_GRAPH_TRACER
> select HAVE_FUNCTION_TRACER if !XIP_KERNEL && !PREEMPTION
> select HAVE_EBPF_JIT if MMU
> + select HAVE_FAST_GUP if MMU
> select HAVE_FUNCTION_ARG_ACCESS_API
> select HAVE_FUNCTION_ERROR_INJECTION
> select HAVE_GCC_PLUGINS
> diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
> index ab00235b018f..c6eb214139e6 100644
> --- a/arch/riscv/include/asm/pgtable.h
> +++ b/arch/riscv/include/asm/pgtable.h
> @@ -673,6 +673,12 @@ static inline int pmd_write(pmd_t pmd)
> return pte_write(pmd_pte(pmd));
> }
>
> +#define pud_write pud_write
> +static inline int pud_write(pud_t pud)
> +{
> + return pte_write(pud_pte(pud));
> +}
> +
> static inline int pmd_dirty(pmd_t pmd)
> {
> return pte_dirty(pmd_pte(pmd));
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 3/4] riscv: enable MMU_GATHER_RCU_TABLE_FREE for SMP && MMU
2023-12-31 6:32 ` Alexandre Ghiti
@ 2024-01-02 3:23 ` Jisheng Zhang
2024-01-04 10:45 ` Alexandre Ghiti
0 siblings, 1 reply; 19+ messages in thread
From: Jisheng Zhang @ 2024-01-02 3:23 UTC (permalink / raw)
To: Alexandre Ghiti
Cc: Paul Walmsley, Palmer Dabbelt, Albert Ou, Will Deacon,
Aneesh Kumar K . V, Andrew Morton, Nick Piggin, Peter Zijlstra,
linux-riscv, linux-kernel, linux-arch, linux-mm
On Sun, Dec 31, 2023 at 07:32:47AM +0100, Alexandre Ghiti wrote:
> On 19/12/2023 18:50, Jisheng Zhang wrote:
> > In order to implement fast gup we need to ensure that the page
> > table walker is protected from page table pages being freed from
> > under it.
> >
> > riscv situation is more complicated than other architectures: some
> > riscv platforms may use IPI to perform TLB shootdown, for example,
> > those platforms which support AIA, usually the riscv_ipi_for_rfence is
> > true on these platforms; some riscv platforms may rely on the SBI to
> > perform TLB shootdown, usually the riscv_ipi_for_rfence is false on
> > these platforms. To keep software pagetable walkers safe in this case
> > we switch to RCU based table free (MMU_GATHER_RCU_TABLE_FREE). See the
> > comment below 'ifdef CONFIG_MMU_GATHER_RCU_TABLE_FREE' in
> > include/asm-generic/tlb.h for more details.
> >
> > This patch enables MMU_GATHER_RCU_TABLE_FREE, then use
> >
> > *tlb_remove_page_ptdesc() for those platforms which use IPI to perform
> > TLB shootdown;
> >
> > *tlb_remove_ptdesc() for those platforms which use SBI to perform TLB
> > shootdown;
>
>
> Can you elaborate a bit more on what those functions do differently and why
> we need to differentiate IPI vs SBI TLB shootdown? I don't understand this.
Hi Alex,
If IPI, the local_irq_save in lockless_pages_from_mm() of fast gup code
path will block page table pages from being freed, I think the comments
there is execellent.
If SBI, the local_irq_save in lockless_pages_from_mm() can't acchieve
the goal however. Because local_irq_save() only disable S-privilege IPI irq,
it can't disable M-privilege's, which the SBI implementation use to
shootdown TLB entry. So we need MMU_GATHER_RCU_TABLE_FREE helper for
SBI case.
Thanks
>
> > Both case mean that disabling interrupts will block the free and
> > protect the fast gup page walker.
> >
> > Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
> > ---
> > arch/riscv/Kconfig | 1 +
> > arch/riscv/include/asm/pgalloc.h | 23 ++++++++++++++++++-----
> > arch/riscv/include/asm/tlb.h | 18 ++++++++++++++++++
> > 3 files changed, 37 insertions(+), 5 deletions(-)
> >
> > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> > index 24c1799e2ec4..d3555173d9f4 100644
> > --- a/arch/riscv/Kconfig
> > +++ b/arch/riscv/Kconfig
> > @@ -147,6 +147,7 @@ config RISCV
> > select IRQ_FORCED_THREADING
> > select KASAN_VMALLOC if KASAN
> > select LOCK_MM_AND_FIND_VMA
> > + select MMU_GATHER_RCU_TABLE_FREE if SMP && MMU
> > select MODULES_USE_ELF_RELA if MODULES
> > select MODULE_SECTIONS if MODULES
> > select OF
> > diff --git a/arch/riscv/include/asm/pgalloc.h b/arch/riscv/include/asm/pgalloc.h
> > index 3c5e3bd15f46..deaf971253a2 100644
> > --- a/arch/riscv/include/asm/pgalloc.h
> > +++ b/arch/riscv/include/asm/pgalloc.h
> > @@ -102,7 +102,10 @@ static inline void __pud_free_tlb(struct mmu_gather *tlb, pud_t *pud,
> > struct ptdesc *ptdesc = virt_to_ptdesc(pud);
> > pagetable_pud_dtor(ptdesc);
> > - tlb_remove_page_ptdesc(tlb, ptdesc);
> > + if (riscv_use_ipi_for_rfence())
> > + tlb_remove_page_ptdesc(tlb, ptdesc);
> > + else
> > + tlb_remove_ptdesc(tlb, ptdesc);
> > }
> > }
> > @@ -136,8 +139,12 @@ static inline void p4d_free(struct mm_struct *mm, p4d_t *p4d)
> > static inline void __p4d_free_tlb(struct mmu_gather *tlb, p4d_t *p4d,
> > unsigned long addr)
> > {
> > - if (pgtable_l5_enabled)
> > - tlb_remove_page_ptdesc(tlb, virt_to_ptdesc(p4d));
> > + if (pgtable_l5_enabled) {
> > + if (riscv_use_ipi_for_rfence())
> > + tlb_remove_page_ptdesc(tlb, virt_to_ptdesc(p4d));
> > + else
> > + tlb_remove_ptdesc(tlb, virt_to_ptdesc(p4d));
> > + }
> > }
> > #endif /* __PAGETABLE_PMD_FOLDED */
> > @@ -169,7 +176,10 @@ static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd,
> > struct ptdesc *ptdesc = virt_to_ptdesc(pmd);
> > pagetable_pmd_dtor(ptdesc);
> > - tlb_remove_page_ptdesc(tlb, ptdesc);
> > + if (riscv_use_ipi_for_rfence())
> > + tlb_remove_page_ptdesc(tlb, ptdesc);
> > + else
> > + tlb_remove_ptdesc(tlb, ptdesc);
> > }
> > #endif /* __PAGETABLE_PMD_FOLDED */
> > @@ -180,7 +190,10 @@ static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
> > struct ptdesc *ptdesc = page_ptdesc(pte);
> > pagetable_pte_dtor(ptdesc);
> > - tlb_remove_page_ptdesc(tlb, ptdesc);
> > + if (riscv_use_ipi_for_rfence())
> > + tlb_remove_page_ptdesc(tlb, ptdesc);
> > + else
> > + tlb_remove_ptdesc(tlb, ptdesc);
> > }
> > #endif /* CONFIG_MMU */
> > diff --git a/arch/riscv/include/asm/tlb.h b/arch/riscv/include/asm/tlb.h
> > index 1eb5682b2af6..a0b8b853503f 100644
> > --- a/arch/riscv/include/asm/tlb.h
> > +++ b/arch/riscv/include/asm/tlb.h
> > @@ -10,6 +10,24 @@ struct mmu_gather;
> > static void tlb_flush(struct mmu_gather *tlb);
> > +#ifdef CONFIG_MMU
> > +#include <linux/swap.h>
> > +
> > +/*
> > + * While riscv platforms with riscv_ipi_for_rfence as true require an IPI to
> > + * perform TLB shootdown, some platforms with riscv_ipi_for_rfence as false use
> > + * SBI to perform TLB shootdown. To keep software pagetable walkers safe in this
> > + * case we switch to RCU based table free (MMU_GATHER_RCU_TABLE_FREE). See the
> > + * comment below 'ifdef CONFIG_MMU_GATHER_RCU_TABLE_FREE' in include/asm-generic/tlb.h
> > + * for more details.
> > + */
> > +static inline void __tlb_remove_table(void *table)
> > +{
> > + free_page_and_swap_cache(table);
> > +}
> > +
> > +#endif /* CONFIG_MMU */
> > +
> > #define tlb_flush tlb_flush
> > #include <asm-generic/tlb.h>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 4/4] riscv: enable HAVE_FAST_GUP if MMU
2023-12-31 6:37 ` Alexandre Ghiti
@ 2024-01-02 3:25 ` Jisheng Zhang
2024-01-04 10:46 ` Alexandre Ghiti
0 siblings, 1 reply; 19+ messages in thread
From: Jisheng Zhang @ 2024-01-02 3:25 UTC (permalink / raw)
To: Alexandre Ghiti
Cc: Paul Walmsley, Palmer Dabbelt, Albert Ou, Will Deacon,
Aneesh Kumar K . V, Andrew Morton, Nick Piggin, Peter Zijlstra,
linux-riscv, linux-kernel, linux-arch, linux-mm
On Sun, Dec 31, 2023 at 07:37:33AM +0100, Alexandre Ghiti wrote:
> On 19/12/2023 18:50, Jisheng Zhang wrote:
> > Activate the fast gup for riscv mmu platforms. Here are some
> > GUP_FAST_BENCHMARK performance numbers:
> >
> > Before the patch:
> > GUP_FAST_BENCHMARK: Time: get:53203 put:5085 us
> >
> > After the patch:
> > GUP_FAST_BENCHMARK: Time: get:17711 put:5060 us
>
>
> On which platform did you run this benchmark?
T-HEAD th1520(cpufreq isn't enabled since the clk/pll isn't upstreamed,
so cpu is running at the default freq set by u-boot)
>
>
> >
> > The get time is reduced by 66.7%! IOW, 3x get speed!
>
>
> Well done!
>
> Thanks,
>
> Alex
>
>
> >
> > Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
> > ---
> > arch/riscv/Kconfig | 1 +
> > arch/riscv/include/asm/pgtable.h | 6 ++++++
> > 2 files changed, 7 insertions(+)
> >
> > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> > index d3555173d9f4..04df9920282d 100644
> > --- a/arch/riscv/Kconfig
> > +++ b/arch/riscv/Kconfig
> > @@ -119,6 +119,7 @@ config RISCV
> > select HAVE_FUNCTION_GRAPH_RETVAL if HAVE_FUNCTION_GRAPH_TRACER
> > select HAVE_FUNCTION_TRACER if !XIP_KERNEL && !PREEMPTION
> > select HAVE_EBPF_JIT if MMU
> > + select HAVE_FAST_GUP if MMU
> > select HAVE_FUNCTION_ARG_ACCESS_API
> > select HAVE_FUNCTION_ERROR_INJECTION
> > select HAVE_GCC_PLUGINS
> > diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
> > index ab00235b018f..c6eb214139e6 100644
> > --- a/arch/riscv/include/asm/pgtable.h
> > +++ b/arch/riscv/include/asm/pgtable.h
> > @@ -673,6 +673,12 @@ static inline int pmd_write(pmd_t pmd)
> > return pte_write(pmd_pte(pmd));
> > }
> > +#define pud_write pud_write
> > +static inline int pud_write(pud_t pud)
> > +{
> > + return pte_write(pud_pte(pud));
> > +}
> > +
> > static inline int pmd_dirty(pmd_t pmd)
> > {
> > return pte_dirty(pmd_pte(pmd));
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 3/4] riscv: enable MMU_GATHER_RCU_TABLE_FREE for SMP && MMU
2024-01-02 3:23 ` Jisheng Zhang
@ 2024-01-04 10:45 ` Alexandre Ghiti
0 siblings, 0 replies; 19+ messages in thread
From: Alexandre Ghiti @ 2024-01-04 10:45 UTC (permalink / raw)
To: Jisheng Zhang
Cc: Paul Walmsley, Palmer Dabbelt, Albert Ou, Will Deacon,
Aneesh Kumar K . V, Andrew Morton, Nick Piggin, Peter Zijlstra,
linux-riscv, linux-kernel, linux-arch, linux-mm
Hi Jisheng,
On 02/01/2024 04:23, Jisheng Zhang wrote:
> On Sun, Dec 31, 2023 at 07:32:47AM +0100, Alexandre Ghiti wrote:
>> On 19/12/2023 18:50, Jisheng Zhang wrote:
>>> In order to implement fast gup we need to ensure that the page
>>> table walker is protected from page table pages being freed from
>>> under it.
>>>
>>> riscv situation is more complicated than other architectures: some
>>> riscv platforms may use IPI to perform TLB shootdown, for example,
>>> those platforms which support AIA, usually the riscv_ipi_for_rfence is
>>> true on these platforms; some riscv platforms may rely on the SBI to
>>> perform TLB shootdown, usually the riscv_ipi_for_rfence is false on
>>> these platforms. To keep software pagetable walkers safe in this case
>>> we switch to RCU based table free (MMU_GATHER_RCU_TABLE_FREE). See the
>>> comment below 'ifdef CONFIG_MMU_GATHER_RCU_TABLE_FREE' in
>>> include/asm-generic/tlb.h for more details.
>>>
>>> This patch enables MMU_GATHER_RCU_TABLE_FREE, then use
>>>
>>> *tlb_remove_page_ptdesc() for those platforms which use IPI to perform
>>> TLB shootdown;
>>>
>>> *tlb_remove_ptdesc() for those platforms which use SBI to perform TLB
>>> shootdown;
>>
>> Can you elaborate a bit more on what those functions do differently and why
>> we need to differentiate IPI vs SBI TLB shootdown? I don't understand this.
> Hi Alex,
>
> If IPI, the local_irq_save in lockless_pages_from_mm() of fast gup code
> path will block page table pages from being freed, I think the comments
> there is execellent.
>
> If SBI, the local_irq_save in lockless_pages_from_mm() can't acchieve
> the goal however. Because local_irq_save() only disable S-privilege IPI irq,
> it can't disable M-privilege's, which the SBI implementation use to
> shootdown TLB entry. So we need MMU_GATHER_RCU_TABLE_FREE helper for
> SBI case.
Ok, I get it now, can you add the following link to your commit
description
https://elixir.bootlin.com/linux/v6.6/source/mm/mmu_gather.c#L162 ? It
describes the problem very clearly.
> Thanks
>
>>> Both case mean that disabling interrupts will block the free and
>>> protect the fast gup page walker.
>>>
>>> Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
>>> ---
>>> arch/riscv/Kconfig | 1 +
>>> arch/riscv/include/asm/pgalloc.h | 23 ++++++++++++++++++-----
>>> arch/riscv/include/asm/tlb.h | 18 ++++++++++++++++++
>>> 3 files changed, 37 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
>>> index 24c1799e2ec4..d3555173d9f4 100644
>>> --- a/arch/riscv/Kconfig
>>> +++ b/arch/riscv/Kconfig
>>> @@ -147,6 +147,7 @@ config RISCV
>>> select IRQ_FORCED_THREADING
>>> select KASAN_VMALLOC if KASAN
>>> select LOCK_MM_AND_FIND_VMA
>>> + select MMU_GATHER_RCU_TABLE_FREE if SMP && MMU
>>> select MODULES_USE_ELF_RELA if MODULES
>>> select MODULE_SECTIONS if MODULES
>>> select OF
>>> diff --git a/arch/riscv/include/asm/pgalloc.h b/arch/riscv/include/asm/pgalloc.h
>>> index 3c5e3bd15f46..deaf971253a2 100644
>>> --- a/arch/riscv/include/asm/pgalloc.h
>>> +++ b/arch/riscv/include/asm/pgalloc.h
>>> @@ -102,7 +102,10 @@ static inline void __pud_free_tlb(struct mmu_gather *tlb, pud_t *pud,
>>> struct ptdesc *ptdesc = virt_to_ptdesc(pud);
>>> pagetable_pud_dtor(ptdesc);
>>> - tlb_remove_page_ptdesc(tlb, ptdesc);
>>> + if (riscv_use_ipi_for_rfence())
>>> + tlb_remove_page_ptdesc(tlb, ptdesc);
>>> + else
>>> + tlb_remove_ptdesc(tlb, ptdesc);
>>> }
>>> }
>>> @@ -136,8 +139,12 @@ static inline void p4d_free(struct mm_struct *mm, p4d_t *p4d)
>>> static inline void __p4d_free_tlb(struct mmu_gather *tlb, p4d_t *p4d,
>>> unsigned long addr)
>>> {
>>> - if (pgtable_l5_enabled)
>>> - tlb_remove_page_ptdesc(tlb, virt_to_ptdesc(p4d));
>>> + if (pgtable_l5_enabled) {
>>> + if (riscv_use_ipi_for_rfence())
>>> + tlb_remove_page_ptdesc(tlb, virt_to_ptdesc(p4d));
>>> + else
>>> + tlb_remove_ptdesc(tlb, virt_to_ptdesc(p4d));
>>> + }
>>> }
>>> #endif /* __PAGETABLE_PMD_FOLDED */
>>> @@ -169,7 +176,10 @@ static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd,
>>> struct ptdesc *ptdesc = virt_to_ptdesc(pmd);
>>> pagetable_pmd_dtor(ptdesc);
>>> - tlb_remove_page_ptdesc(tlb, ptdesc);
>>> + if (riscv_use_ipi_for_rfence())
>>> + tlb_remove_page_ptdesc(tlb, ptdesc);
>>> + else
>>> + tlb_remove_ptdesc(tlb, ptdesc);
>>> }
>>> #endif /* __PAGETABLE_PMD_FOLDED */
>>> @@ -180,7 +190,10 @@ static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
>>> struct ptdesc *ptdesc = page_ptdesc(pte);
>>> pagetable_pte_dtor(ptdesc);
>>> - tlb_remove_page_ptdesc(tlb, ptdesc);
>>> + if (riscv_use_ipi_for_rfence())
>>> + tlb_remove_page_ptdesc(tlb, ptdesc);
>>> + else
>>> + tlb_remove_ptdesc(tlb, ptdesc);
>>> }
>>> #endif /* CONFIG_MMU */
>>> diff --git a/arch/riscv/include/asm/tlb.h b/arch/riscv/include/asm/tlb.h
>>> index 1eb5682b2af6..a0b8b853503f 100644
>>> --- a/arch/riscv/include/asm/tlb.h
>>> +++ b/arch/riscv/include/asm/tlb.h
>>> @@ -10,6 +10,24 @@ struct mmu_gather;
>>> static void tlb_flush(struct mmu_gather *tlb);
>>> +#ifdef CONFIG_MMU
>>> +#include <linux/swap.h>
>>> +
>>> +/*
>>> + * While riscv platforms with riscv_ipi_for_rfence as true require an IPI to
>>> + * perform TLB shootdown, some platforms with riscv_ipi_for_rfence as false use
>>> + * SBI to perform TLB shootdown. To keep software pagetable walkers safe in this
>>> + * case we switch to RCU based table free (MMU_GATHER_RCU_TABLE_FREE). See the
>>> + * comment below 'ifdef CONFIG_MMU_GATHER_RCU_TABLE_FREE' in include/asm-generic/tlb.h
A direct link would be better, I did not find the comment you mention
here and even though, it can still move around later.
And then you can add:
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Thanks!
Alex
>>> + * for more details.
>>> + */
>>> +static inline void __tlb_remove_table(void *table)
>>> +{
>>> + free_page_and_swap_cache(table);
>>> +}
>>> +
>>> +#endif /* CONFIG_MMU */
>>> +
>>> #define tlb_flush tlb_flush
>>> #include <asm-generic/tlb.h>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 4/4] riscv: enable HAVE_FAST_GUP if MMU
2024-01-02 3:25 ` Jisheng Zhang
@ 2024-01-04 10:46 ` Alexandre Ghiti
0 siblings, 0 replies; 19+ messages in thread
From: Alexandre Ghiti @ 2024-01-04 10:46 UTC (permalink / raw)
To: Jisheng Zhang
Cc: Paul Walmsley, Palmer Dabbelt, Albert Ou, Will Deacon,
Aneesh Kumar K . V, Andrew Morton, Nick Piggin, Peter Zijlstra,
linux-riscv, linux-kernel, linux-arch, linux-mm
On 02/01/2024 04:25, Jisheng Zhang wrote:
> On Sun, Dec 31, 2023 at 07:37:33AM +0100, Alexandre Ghiti wrote:
>> On 19/12/2023 18:50, Jisheng Zhang wrote:
>>> Activate the fast gup for riscv mmu platforms. Here are some
>>> GUP_FAST_BENCHMARK performance numbers:
>>>
>>> Before the patch:
>>> GUP_FAST_BENCHMARK: Time: get:53203 put:5085 us
>>>
>>> After the patch:
>>> GUP_FAST_BENCHMARK: Time: get:17711 put:5060 us
>>
>> On which platform did you run this benchmark?
> T-HEAD th1520(cpufreq isn't enabled since the clk/pll isn't upstreamed,
> so cpu is running at the default freq set by u-boot)
>>
>>> The get time is reduced by 66.7%! IOW, 3x get speed!
>>
>> Well done!
>>
>> Thanks,
>>
>> Alex
>>
>>
>>> Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
>>> ---
>>> arch/riscv/Kconfig | 1 +
>>> arch/riscv/include/asm/pgtable.h | 6 ++++++
>>> 2 files changed, 7 insertions(+)
>>>
>>> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
>>> index d3555173d9f4..04df9920282d 100644
>>> --- a/arch/riscv/Kconfig
>>> +++ b/arch/riscv/Kconfig
>>> @@ -119,6 +119,7 @@ config RISCV
>>> select HAVE_FUNCTION_GRAPH_RETVAL if HAVE_FUNCTION_GRAPH_TRACER
>>> select HAVE_FUNCTION_TRACER if !XIP_KERNEL && !PREEMPTION
>>> select HAVE_EBPF_JIT if MMU
>>> + select HAVE_FAST_GUP if MMU
>>> select HAVE_FUNCTION_ARG_ACCESS_API
>>> select HAVE_FUNCTION_ERROR_INJECTION
>>> select HAVE_GCC_PLUGINS
>>> diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
>>> index ab00235b018f..c6eb214139e6 100644
>>> --- a/arch/riscv/include/asm/pgtable.h
>>> +++ b/arch/riscv/include/asm/pgtable.h
>>> @@ -673,6 +673,12 @@ static inline int pmd_write(pmd_t pmd)
>>> return pte_write(pmd_pte(pmd));
>>> }
>>> +#define pud_write pud_write
>>> +static inline int pud_write(pud_t pud)
>>> +{
>>> + return pte_write(pud_pte(pud));
>>> +}
>>> +
>>> static inline int pmd_dirty(pmd_t pmd)
>>> {
>>> return pte_dirty(pmd_pte(pmd));
Thanks, you can add:
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Thanks,
Alex
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/4] riscv: tlb: fix __p*d_free_tlb()
2023-12-19 17:50 ` [PATCH 1/4] riscv: tlb: fix __p*d_free_tlb() Jisheng Zhang
2023-12-31 6:21 ` Alexandre Ghiti
@ 2024-01-04 10:55 ` Alexandre Ghiti
2024-01-10 14:52 ` Palmer Dabbelt
1 sibling, 1 reply; 19+ messages in thread
From: Alexandre Ghiti @ 2024-01-04 10:55 UTC (permalink / raw)
To: Jisheng Zhang, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Will Deacon, Aneesh Kumar K . V, Andrew Morton, Nick Piggin,
Peter Zijlstra, Conor Dooley
Cc: linux-riscv, linux-kernel, linux-arch, linux-mm
On 19/12/2023 18:50, Jisheng Zhang wrote:
> If non-leaf PTEs I.E pmd, pud or p4d is modified, a sfence.vma is
> a must for safe, imagine if an implementation caches the non-leaf
> translation in TLB, although I didn't meet this HW so far, but it's
> possible in theory.
And since this is a fix, it would be worth trying to add a Fixes tag
here. Not easy I agree because it fixes several commits (I have
07037db5d479f, e8a62cc26ddf5, d10efa21a9374 and c5e9b2c2ae822 if you
implement tlb_flush() as I suggested).
So I would add the latest commit as the Fixes commit (which would be
c5e9b2c2ae822), and then I'd send a patch to stable for each commit with
the right Fixes tag...@Conor: let me know if you have a simpler idea or
if this is wrong.
Thanks,
Alex
> Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
> ---
> arch/riscv/include/asm/pgalloc.h | 20 +++++++++++++++++---
> 1 file changed, 17 insertions(+), 3 deletions(-)
>
> diff --git a/arch/riscv/include/asm/pgalloc.h b/arch/riscv/include/asm/pgalloc.h
> index d169a4f41a2e..a12fb83fa1f5 100644
> --- a/arch/riscv/include/asm/pgalloc.h
> +++ b/arch/riscv/include/asm/pgalloc.h
> @@ -95,7 +95,13 @@ static inline void pud_free(struct mm_struct *mm, pud_t *pud)
> __pud_free(mm, pud);
> }
>
> -#define __pud_free_tlb(tlb, pud, addr) pud_free((tlb)->mm, pud)
> +#define __pud_free_tlb(tlb, pud, addr) \
> +do { \
> + if (pgtable_l4_enabled) { \
> + pagetable_pud_dtor(virt_to_ptdesc(pud)); \
> + tlb_remove_page_ptdesc((tlb), virt_to_ptdesc(pud)); \
> + } \
> +} while (0)
>
> #define p4d_alloc_one p4d_alloc_one
> static inline p4d_t *p4d_alloc_one(struct mm_struct *mm, unsigned long addr)
> @@ -124,7 +130,11 @@ static inline void p4d_free(struct mm_struct *mm, p4d_t *p4d)
> __p4d_free(mm, p4d);
> }
>
> -#define __p4d_free_tlb(tlb, p4d, addr) p4d_free((tlb)->mm, p4d)
> +#define __p4d_free_tlb(tlb, p4d, addr) \
> +do { \
> + if (pgtable_l5_enabled) \
> + tlb_remove_page_ptdesc((tlb), virt_to_ptdesc(p4d)); \
> +} while (0)
> #endif /* __PAGETABLE_PMD_FOLDED */
>
> static inline void sync_kernel_mappings(pgd_t *pgd)
> @@ -149,7 +159,11 @@ static inline pgd_t *pgd_alloc(struct mm_struct *mm)
>
> #ifndef __PAGETABLE_PMD_FOLDED
>
> -#define __pmd_free_tlb(tlb, pmd, addr) pmd_free((tlb)->mm, pmd)
> +#define __pmd_free_tlb(tlb, pmd, addr) \
> +do { \
> + pagetable_pmd_dtor(virt_to_ptdesc(pmd)); \
> + tlb_remove_page_ptdesc((tlb), virt_to_ptdesc(pmd)); \
> +} while (0)
>
> #endif /* __PAGETABLE_PMD_FOLDED */
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/4] riscv: tlb: fix __p*d_free_tlb()
2024-01-04 10:55 ` Alexandre Ghiti
@ 2024-01-10 14:52 ` Palmer Dabbelt
0 siblings, 0 replies; 19+ messages in thread
From: Palmer Dabbelt @ 2024-01-10 14:52 UTC (permalink / raw)
To: alex
Cc: jszhang, Paul Walmsley, aou, Will Deacon, aneesh.kumar, akpm,
npiggin, peterz, Conor Dooley, linux-riscv, linux-kernel,
linux-arch, linux-mm
On Thu, 04 Jan 2024 02:55:40 PST (-0800), alex@ghiti.fr wrote:
> On 19/12/2023 18:50, Jisheng Zhang wrote:
>> If non-leaf PTEs I.E pmd, pud or p4d is modified, a sfence.vma is
>> a must for safe, imagine if an implementation caches the non-leaf
>> translation in TLB, although I didn't meet this HW so far, but it's
>> possible in theory.
>
>
> And since this is a fix, it would be worth trying to add a Fixes tag
> here. Not easy I agree because it fixes several commits (I have
> 07037db5d479f, e8a62cc26ddf5, d10efa21a9374 and c5e9b2c2ae822 if you
> implement tlb_flush() as I suggested).
>
> So I would add the latest commit as the Fixes commit (which would be
> c5e9b2c2ae822), and then I'd send a patch to stable for each commit with
> the right Fixes tag...@Conor: let me know if you have a simpler idea or
> if this is wrong.
I just went with
Fixes: c5e9b2c2ae82 ("riscv: Improve tlb_flush()")
Cc: stable@vger.kernel.org
hopefully that's fine. It's still getting tested, it's batched up with
some other stuff and I managed to find a bad merge so it might take a
bit...
>
> Thanks,
>
> Alex
>
>
>> Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
>> ---
>> arch/riscv/include/asm/pgalloc.h | 20 +++++++++++++++++---
>> 1 file changed, 17 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/riscv/include/asm/pgalloc.h b/arch/riscv/include/asm/pgalloc.h
>> index d169a4f41a2e..a12fb83fa1f5 100644
>> --- a/arch/riscv/include/asm/pgalloc.h
>> +++ b/arch/riscv/include/asm/pgalloc.h
>> @@ -95,7 +95,13 @@ static inline void pud_free(struct mm_struct *mm, pud_t *pud)
>> __pud_free(mm, pud);
>> }
>>
>> -#define __pud_free_tlb(tlb, pud, addr) pud_free((tlb)->mm, pud)
>> +#define __pud_free_tlb(tlb, pud, addr) \
>> +do { \
>> + if (pgtable_l4_enabled) { \
>> + pagetable_pud_dtor(virt_to_ptdesc(pud)); \
>> + tlb_remove_page_ptdesc((tlb), virt_to_ptdesc(pud)); \
>> + } \
>> +} while (0)
>>
>> #define p4d_alloc_one p4d_alloc_one
>> static inline p4d_t *p4d_alloc_one(struct mm_struct *mm, unsigned long addr)
>> @@ -124,7 +130,11 @@ static inline void p4d_free(struct mm_struct *mm, p4d_t *p4d)
>> __p4d_free(mm, p4d);
>> }
>>
>> -#define __p4d_free_tlb(tlb, p4d, addr) p4d_free((tlb)->mm, p4d)
>> +#define __p4d_free_tlb(tlb, p4d, addr) \
>> +do { \
>> + if (pgtable_l5_enabled) \
>> + tlb_remove_page_ptdesc((tlb), virt_to_ptdesc(p4d)); \
>> +} while (0)
>> #endif /* __PAGETABLE_PMD_FOLDED */
>>
>> static inline void sync_kernel_mappings(pgd_t *pgd)
>> @@ -149,7 +159,11 @@ static inline pgd_t *pgd_alloc(struct mm_struct *mm)
>>
>> #ifndef __PAGETABLE_PMD_FOLDED
>>
>> -#define __pmd_free_tlb(tlb, pmd, addr) pmd_free((tlb)->mm, pmd)
>> +#define __pmd_free_tlb(tlb, pmd, addr) \
>> +do { \
>> + pagetable_pmd_dtor(virt_to_ptdesc(pmd)); \
>> + tlb_remove_page_ptdesc((tlb), virt_to_ptdesc(pmd)); \
>> +} while (0)
>>
>> #endif /* __PAGETABLE_PMD_FOLDED */
>>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 0/4] riscv: support fast gup
2023-12-19 17:50 [PATCH 0/4] riscv: support fast gup Jisheng Zhang
` (3 preceding siblings ...)
2023-12-19 17:50 ` [PATCH 4/4] riscv: enable HAVE_FAST_GUP if MMU Jisheng Zhang
@ 2024-01-25 21:30 ` patchwork-bot+linux-riscv
4 siblings, 0 replies; 19+ messages in thread
From: patchwork-bot+linux-riscv @ 2024-01-25 21:30 UTC (permalink / raw)
To: Jisheng Zhang
Cc: linux-riscv, paul.walmsley, palmer, aou, will, aneesh.kumar,
akpm, npiggin, peterz, linux-kernel, linux-arch, linux-mm
Hello:
This series was applied to riscv/linux.git (for-next)
by Palmer Dabbelt <palmer@rivosinc.com>:
On Wed, 20 Dec 2023 01:50:42 +0800 you wrote:
> This series adds fast gup support to riscv.
>
> The First patch fixes a bug in __p*d_free_tlb(). Per the riscv
> privileged spec, if non-leaf PTEs I.E pmd, pud or p4d is modified, a
> sfence.vma is a must.
>
> The 2nd patch is a preparation patch.
>
> [...]
Here is the summary with links:
- [1/4] riscv: tlb: fix __p*d_free_tlb()
https://git.kernel.org/riscv/c/8246601a7d39
- [2/4] riscv: tlb: convert __p*d_free_tlb() to inline functions
https://git.kernel.org/riscv/c/40d1bb92a493
- [3/4] riscv: enable MMU_GATHER_RCU_TABLE_FREE for SMP && MMU
https://git.kernel.org/riscv/c/69be3fb111e7
- [4/4] riscv: enable HAVE_FAST_GUP if MMU
https://git.kernel.org/riscv/c/3f910b7a522e
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/4] riscv: tlb: fix __p*d_free_tlb()
2023-12-31 6:21 ` Alexandre Ghiti
@ 2024-02-07 7:28 ` Alexandre Ghiti
0 siblings, 0 replies; 19+ messages in thread
From: Alexandre Ghiti @ 2024-02-07 7:28 UTC (permalink / raw)
To: Jisheng Zhang, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Will Deacon, Aneesh Kumar K . V, Andrew Morton, Nick Piggin,
Peter Zijlstra
Cc: linux-riscv, linux-kernel, linux-arch, linux-mm
Hi Jisheng,
On 31/12/2023 07:21, Alexandre Ghiti wrote:
> Hi Jisheng,
>
> On 19/12/2023 18:50, Jisheng Zhang wrote:
>> If non-leaf PTEs I.E pmd, pud or p4d is modified, a sfence.vma is
>> a must for safe, imagine if an implementation caches the non-leaf
>> translation in TLB, although I didn't meet this HW so far, but it's
>> possible in theory.
>>
>> Signed-off-by: Jisheng Zhang<jszhang@kernel.org>
>> ---
>> arch/riscv/include/asm/pgalloc.h | 20 +++++++++++++++++---
>> 1 file changed, 17 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/riscv/include/asm/pgalloc.h
>> b/arch/riscv/include/asm/pgalloc.h
>> index d169a4f41a2e..a12fb83fa1f5 100644
>> --- a/arch/riscv/include/asm/pgalloc.h
>> +++ b/arch/riscv/include/asm/pgalloc.h
>> @@ -95,7 +95,13 @@ static inline void pud_free(struct mm_struct *mm,
>> pud_t *pud)
>> __pud_free(mm, pud);
>> }
>> -#define __pud_free_tlb(tlb, pud, addr) pud_free((tlb)->mm, pud)
>> +#define __pud_free_tlb(tlb, pud, addr) \
>> +do { \
>> + if (pgtable_l4_enabled) { \
>> + pagetable_pud_dtor(virt_to_ptdesc(pud)); \
>> + tlb_remove_page_ptdesc((tlb), virt_to_ptdesc(pud)); \
>
>
> The specification indeed states that an sfence.vma must be emitted
> after a page directory modification. Your change is not enough though
> since eventually tlb_flush() is called and in this function we should
> add:
>
> if (tlb->freed_tables)
> tlb_flush_mm();
I sent a patch for that here
https://lore.kernel.org/linux-riscv/20240128120405.25876-1-alexghiti@rivosinc.com/
You can add:
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Thanks,
Alex
>
> otherwise we are not guaranteed that a "global" sfence.vma is called.
>
> Would you be able to benchmark this change and see the performance
> impact?
>
> Thanks,
>
> Alex
>
>
>> + } \
>> +} while (0)
>> #define p4d_alloc_one p4d_alloc_one
>> static inline p4d_t *p4d_alloc_one(struct mm_struct *mm, unsigned
>> long addr)
>> @@ -124,7 +130,11 @@ static inline void p4d_free(struct mm_struct
>> *mm, p4d_t *p4d)
>> __p4d_free(mm, p4d);
>> }
>> -#define __p4d_free_tlb(tlb, p4d, addr) p4d_free((tlb)->mm, p4d)
>> +#define __p4d_free_tlb(tlb, p4d, addr) \
>> +do { \
>> + if (pgtable_l5_enabled) \
>> + tlb_remove_page_ptdesc((tlb), virt_to_ptdesc(p4d)); \
>> +} while (0)
>> #endif /* __PAGETABLE_PMD_FOLDED */
>> static inline void sync_kernel_mappings(pgd_t *pgd)
>> @@ -149,7 +159,11 @@ static inline pgd_t *pgd_alloc(struct mm_struct
>> *mm)
>> #ifndef __PAGETABLE_PMD_FOLDED
>> -#define __pmd_free_tlb(tlb, pmd, addr) pmd_free((tlb)->mm, pmd)
>> +#define __pmd_free_tlb(tlb, pmd, addr) \
>> +do { \
>> + pagetable_pmd_dtor(virt_to_ptdesc(pmd)); \
>> + tlb_remove_page_ptdesc((tlb), virt_to_ptdesc(pmd)); \
>> +} while (0)
>> #endif /* __PAGETABLE_PMD_FOLDED */
>
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2024-02-07 7:28 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-19 17:50 [PATCH 0/4] riscv: support fast gup Jisheng Zhang
2023-12-19 17:50 ` [PATCH 1/4] riscv: tlb: fix __p*d_free_tlb() Jisheng Zhang
2023-12-31 6:21 ` Alexandre Ghiti
2024-02-07 7:28 ` Alexandre Ghiti
2024-01-04 10:55 ` Alexandre Ghiti
2024-01-10 14:52 ` Palmer Dabbelt
2023-12-19 17:50 ` [PATCH 2/4] riscv: tlb: convert __p*d_free_tlb() to inline functions Jisheng Zhang
2023-12-20 2:59 ` [External] " yunhui cui
2023-12-20 12:57 ` Jisheng Zhang
2023-12-31 6:24 ` Alexandre Ghiti
2023-12-19 17:50 ` [PATCH 3/4] riscv: enable MMU_GATHER_RCU_TABLE_FREE for SMP && MMU Jisheng Zhang
2023-12-31 6:32 ` Alexandre Ghiti
2024-01-02 3:23 ` Jisheng Zhang
2024-01-04 10:45 ` Alexandre Ghiti
2023-12-19 17:50 ` [PATCH 4/4] riscv: enable HAVE_FAST_GUP if MMU Jisheng Zhang
2023-12-31 6:37 ` Alexandre Ghiti
2024-01-02 3:25 ` Jisheng Zhang
2024-01-04 10:46 ` Alexandre Ghiti
2024-01-25 21:30 ` [PATCH 0/4] riscv: support fast gup patchwork-bot+linux-riscv
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox