* [PATCH v3 1/9] riscv: mm: Properly forward vmemmap_populate() altmap parameter
2024-05-21 11:48 [PATCH v3 0/9] riscv: Memory Hot(Un)Plug support Björn Töpel
@ 2024-05-21 11:48 ` Björn Töpel
2024-05-21 12:22 ` Alexandre Ghiti
2024-05-21 11:48 ` [PATCH v3 2/9] riscv: mm: Pre-allocate vmemmap/direct map PGD entries Björn Töpel
` (7 subsequent siblings)
8 siblings, 1 reply; 19+ messages in thread
From: Björn Töpel @ 2024-05-21 11:48 UTC (permalink / raw)
To: Alexandre Ghiti, Albert Ou, David Hildenbrand, Palmer Dabbelt,
Paul Walmsley, linux-riscv, Oscar Salvador
Cc: Björn Töpel, Andrew Bresticker, Chethan Seshadri,
Lorenzo Stoakes, Santosh Mamila, Sivakumar Munnangi, Sunil V L,
linux-kernel, linux-mm, virtualization
From: Björn Töpel <bjorn@rivosinc.com>
Make sure that the altmap parameter is properly passed on to
vmemmap_populate_hugepages().
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
---
arch/riscv/mm/init.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index 2574f6a3b0e7..b66f846e7634 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -1434,7 +1434,7 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
* memory hotplug, we are not able to update all the page tables with
* the new PMDs.
*/
- return vmemmap_populate_hugepages(start, end, node, NULL);
+ return vmemmap_populate_hugepages(start, end, node, altmap);
}
#endif
--
2.40.1
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH v3 1/9] riscv: mm: Properly forward vmemmap_populate() altmap parameter
2024-05-21 11:48 ` [PATCH v3 1/9] riscv: mm: Properly forward vmemmap_populate() altmap parameter Björn Töpel
@ 2024-05-21 12:22 ` Alexandre Ghiti
0 siblings, 0 replies; 19+ messages in thread
From: Alexandre Ghiti @ 2024-05-21 12:22 UTC (permalink / raw)
To: Björn Töpel
Cc: Albert Ou, David Hildenbrand, Palmer Dabbelt, Paul Walmsley,
linux-riscv, Oscar Salvador, Björn Töpel,
Andrew Bresticker, Chethan Seshadri, Lorenzo Stoakes,
Santosh Mamila, Sivakumar Munnangi, Sunil V L, linux-kernel,
linux-mm, virtualization
Hi Björn,
On Tue, May 21, 2024 at 1:48 PM Björn Töpel <bjorn@kernel.org> wrote:
>
> From: Björn Töpel <bjorn@rivosinc.com>
>
> Make sure that the altmap parameter is properly passed on to
> vmemmap_populate_hugepages().
>
> Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
> ---
> arch/riscv/mm/init.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
> index 2574f6a3b0e7..b66f846e7634 100644
> --- a/arch/riscv/mm/init.c
> +++ b/arch/riscv/mm/init.c
> @@ -1434,7 +1434,7 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
> * memory hotplug, we are not able to update all the page tables with
> * the new PMDs.
> */
> - return vmemmap_populate_hugepages(start, end, node, NULL);
> + return vmemmap_populate_hugepages(start, end, node, altmap);
> }
> #endif
>
> --
> 2.40.1
>
You can add:
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Thanks,
Alex
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH v3 2/9] riscv: mm: Pre-allocate vmemmap/direct map PGD entries
2024-05-21 11:48 [PATCH v3 0/9] riscv: Memory Hot(Un)Plug support Björn Töpel
2024-05-21 11:48 ` [PATCH v3 1/9] riscv: mm: Properly forward vmemmap_populate() altmap parameter Björn Töpel
@ 2024-05-21 11:48 ` Björn Töpel
2024-05-21 16:09 ` Björn Töpel
2024-05-21 11:48 ` [PATCH v3 3/9] riscv: mm: Change attribute from __init to __meminit for page functions Björn Töpel
` (6 subsequent siblings)
8 siblings, 1 reply; 19+ messages in thread
From: Björn Töpel @ 2024-05-21 11:48 UTC (permalink / raw)
To: Alexandre Ghiti, Albert Ou, David Hildenbrand, Palmer Dabbelt,
Paul Walmsley, linux-riscv, Oscar Salvador
Cc: Björn Töpel, Andrew Bresticker, Chethan Seshadri,
Lorenzo Stoakes, Santosh Mamila, Sivakumar Munnangi, Sunil V L,
linux-kernel, linux-mm, virtualization
From: Björn Töpel <bjorn@rivosinc.com>
The RISC-V port copies the PGD table from init_mm/swapper_pg_dir to
all userland page tables, which means that if the PGD level table is
changed, other page tables has to be updated as well.
Instead of having the PGD changes ripple out to all tables, the
synchronization can be avoided by pre-allocating the PGD entries/pages
at boot, avoiding the synchronization all together.
This is currently done for the bpf/modules, and vmalloc PGD regions.
Extend this scheme for the PGD regions touched by memory hotplugging.
Prepare the RISC-V port for memory hotplug by pre-allocate
vmemmap/direct map entries at the PGD level. This will roughly waste
~128 worth of 4K pages when memory hotplugging is enabled in the
kernel configuration.
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
---
arch/riscv/include/asm/kasan.h | 4 ++--
arch/riscv/mm/init.c | 7 +++++++
2 files changed, 9 insertions(+), 2 deletions(-)
diff --git a/arch/riscv/include/asm/kasan.h b/arch/riscv/include/asm/kasan.h
index 0b85e363e778..e6a0071bdb56 100644
--- a/arch/riscv/include/asm/kasan.h
+++ b/arch/riscv/include/asm/kasan.h
@@ -6,8 +6,6 @@
#ifndef __ASSEMBLY__
-#ifdef CONFIG_KASAN
-
/*
* The following comment was copied from arm64:
* KASAN_SHADOW_START: beginning of the kernel virtual addresses.
@@ -34,6 +32,8 @@
*/
#define KASAN_SHADOW_START ((KASAN_SHADOW_END - KASAN_SHADOW_SIZE) & PGDIR_MASK)
#define KASAN_SHADOW_END MODULES_LOWEST_VADDR
+
+#ifdef CONFIG_KASAN
#define KASAN_SHADOW_OFFSET _AC(CONFIG_KASAN_SHADOW_OFFSET, UL)
void kasan_init(void);
diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index b66f846e7634..c98010ede810 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -27,6 +27,7 @@
#include <asm/fixmap.h>
#include <asm/io.h>
+#include <asm/kasan.h>
#include <asm/numa.h>
#include <asm/pgtable.h>
#include <asm/sections.h>
@@ -1488,10 +1489,16 @@ static void __init preallocate_pgd_pages_range(unsigned long start, unsigned lon
panic("Failed to pre-allocate %s pages for %s area\n", lvl, area);
}
+#define PAGE_END KASAN_SHADOW_START
+
void __init pgtable_cache_init(void)
{
preallocate_pgd_pages_range(VMALLOC_START, VMALLOC_END, "vmalloc");
if (IS_ENABLED(CONFIG_MODULES))
preallocate_pgd_pages_range(MODULES_VADDR, MODULES_END, "bpf/modules");
+ if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG)) {
+ preallocate_pgd_pages_range(VMEMMAP_START, VMEMMAP_END, "vmemmap");
+ preallocate_pgd_pages_range(PAGE_OFFSET, PAGE_END, "direct map");
+ }
}
#endif
--
2.40.1
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH v3 2/9] riscv: mm: Pre-allocate vmemmap/direct map PGD entries
2024-05-21 11:48 ` [PATCH v3 2/9] riscv: mm: Pre-allocate vmemmap/direct map PGD entries Björn Töpel
@ 2024-05-21 16:09 ` Björn Töpel
0 siblings, 0 replies; 19+ messages in thread
From: Björn Töpel @ 2024-05-21 16:09 UTC (permalink / raw)
To: Alexandre Ghiti, Albert Ou, David Hildenbrand, Palmer Dabbelt,
Paul Walmsley, linux-riscv, Oscar Salvador
Cc: Björn Töpel, Andrew Bresticker, Chethan Seshadri,
Lorenzo Stoakes, Santosh Mamila, Sivakumar Munnangi, Sunil V L,
linux-kernel, linux-mm, virtualization
Björn Töpel <bjorn@kernel.org> writes:
> From: Björn Töpel <bjorn@rivosinc.com>
>
> The RISC-V port copies the PGD table from init_mm/swapper_pg_dir to
> all userland page tables, which means that if the PGD level table is
> changed, other page tables has to be updated as well.
>
> Instead of having the PGD changes ripple out to all tables, the
> synchronization can be avoided by pre-allocating the PGD entries/pages
> at boot, avoiding the synchronization all together.
>
> This is currently done for the bpf/modules, and vmalloc PGD regions.
> Extend this scheme for the PGD regions touched by memory hotplugging.
>
> Prepare the RISC-V port for memory hotplug by pre-allocate
> vmemmap/direct map entries at the PGD level. This will roughly waste
> ~128 worth of 4K pages when memory hotplugging is enabled in the
> kernel configuration.
>
> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
> Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
> ---
> arch/riscv/include/asm/kasan.h | 4 ++--
> arch/riscv/mm/init.c | 7 +++++++
> 2 files changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/arch/riscv/include/asm/kasan.h b/arch/riscv/include/asm/kasan.h
> index 0b85e363e778..e6a0071bdb56 100644
> --- a/arch/riscv/include/asm/kasan.h
> +++ b/arch/riscv/include/asm/kasan.h
> @@ -6,8 +6,6 @@
>
> #ifndef __ASSEMBLY__
>
> -#ifdef CONFIG_KASAN
> -
> /*
> * The following comment was copied from arm64:
> * KASAN_SHADOW_START: beginning of the kernel virtual addresses.
> @@ -34,6 +32,8 @@
> */
> #define KASAN_SHADOW_START ((KASAN_SHADOW_END - KASAN_SHADOW_SIZE) & PGDIR_MASK)
> #define KASAN_SHADOW_END MODULES_LOWEST_VADDR
> +
> +#ifdef CONFIG_KASAN
> #define KASAN_SHADOW_OFFSET _AC(CONFIG_KASAN_SHADOW_OFFSET, UL)
>
> void kasan_init(void);
> diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
> index b66f846e7634..c98010ede810 100644
> --- a/arch/riscv/mm/init.c
> +++ b/arch/riscv/mm/init.c
> @@ -27,6 +27,7 @@
>
> #include <asm/fixmap.h>
> #include <asm/io.h>
> +#include <asm/kasan.h>
> #include <asm/numa.h>
> #include <asm/pgtable.h>
> #include <asm/sections.h>
> @@ -1488,10 +1489,16 @@ static void __init preallocate_pgd_pages_range(unsigned long start, unsigned lon
> panic("Failed to pre-allocate %s pages for %s area\n", lvl, area);
> }
>
> +#define PAGE_END KASAN_SHADOW_START
> +
> void __init pgtable_cache_init(void)
> {
> preallocate_pgd_pages_range(VMALLOC_START, VMALLOC_END, "vmalloc");
> if (IS_ENABLED(CONFIG_MODULES))
> preallocate_pgd_pages_range(MODULES_VADDR, MODULES_END, "bpf/modules");
> + if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG)) {
> + preallocate_pgd_pages_range(VMEMMAP_START, VMEMMAP_END, "vmemmap");
> + preallocate_pgd_pages_range(PAGE_OFFSET, PAGE_END, "direct map");
Alex pointed out that KASAN PGDs should be preallocated as well! I'll
address this in the next revision.
Björn
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH v3 3/9] riscv: mm: Change attribute from __init to __meminit for page functions
2024-05-21 11:48 [PATCH v3 0/9] riscv: Memory Hot(Un)Plug support Björn Töpel
2024-05-21 11:48 ` [PATCH v3 1/9] riscv: mm: Properly forward vmemmap_populate() altmap parameter Björn Töpel
2024-05-21 11:48 ` [PATCH v3 2/9] riscv: mm: Pre-allocate vmemmap/direct map PGD entries Björn Töpel
@ 2024-05-21 11:48 ` Björn Töpel
2024-05-21 12:35 ` Alexandre Ghiti
2024-05-21 11:48 ` [PATCH v3 4/9] riscv: mm: Refactor create_linear_mapping_range() for memory hot add Björn Töpel
` (5 subsequent siblings)
8 siblings, 1 reply; 19+ messages in thread
From: Björn Töpel @ 2024-05-21 11:48 UTC (permalink / raw)
To: Alexandre Ghiti, Albert Ou, David Hildenbrand, Palmer Dabbelt,
Paul Walmsley, linux-riscv, Oscar Salvador
Cc: Björn Töpel, Andrew Bresticker, Chethan Seshadri,
Lorenzo Stoakes, Santosh Mamila, Sivakumar Munnangi, Sunil V L,
linux-kernel, linux-mm, virtualization
From: Björn Töpel <bjorn@rivosinc.com>
Prepare for memory hotplugging support by changing from __init to
__meminit for the page table functions that are used by the upcoming
architecture specific callbacks.
Changing the __init attribute to __meminit, avoids that the functions
are removed after init. The __meminit attribute makes sure the
functions are kept in the kernel text post init, but only if memory
hotplugging is enabled for the build.
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
---
arch/riscv/include/asm/mmu.h | 4 +--
arch/riscv/include/asm/pgtable.h | 2 +-
arch/riscv/mm/init.c | 56 ++++++++++++++------------------
3 files changed, 28 insertions(+), 34 deletions(-)
diff --git a/arch/riscv/include/asm/mmu.h b/arch/riscv/include/asm/mmu.h
index 947fd60f9051..c9e03e9da3dc 100644
--- a/arch/riscv/include/asm/mmu.h
+++ b/arch/riscv/include/asm/mmu.h
@@ -31,8 +31,8 @@ typedef struct {
#define cntx2asid(cntx) ((cntx) & SATP_ASID_MASK)
#define cntx2version(cntx) ((cntx) & ~SATP_ASID_MASK)
-void __init create_pgd_mapping(pgd_t *pgdp, uintptr_t va, phys_addr_t pa,
- phys_addr_t sz, pgprot_t prot);
+void __meminit create_pgd_mapping(pgd_t *pgdp, uintptr_t va, phys_addr_t pa, phys_addr_t sz,
+ pgprot_t prot);
#endif /* __ASSEMBLY__ */
#endif /* _ASM_RISCV_MMU_H */
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 58fd7b70b903..7933f493db71 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -162,7 +162,7 @@ struct pt_alloc_ops {
#endif
};
-extern struct pt_alloc_ops pt_ops __initdata;
+extern struct pt_alloc_ops pt_ops __meminitdata;
#ifdef CONFIG_MMU
/* Number of PGD entries that a user-mode program can use */
diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index c98010ede810..c969427eab88 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -295,7 +295,7 @@ static void __init setup_bootmem(void)
}
#ifdef CONFIG_MMU
-struct pt_alloc_ops pt_ops __initdata;
+struct pt_alloc_ops pt_ops __meminitdata;
pgd_t swapper_pg_dir[PTRS_PER_PGD] __page_aligned_bss;
pgd_t trampoline_pg_dir[PTRS_PER_PGD] __page_aligned_bss;
@@ -357,7 +357,7 @@ static inline pte_t *__init get_pte_virt_fixmap(phys_addr_t pa)
return (pte_t *)set_fixmap_offset(FIX_PTE, pa);
}
-static inline pte_t *__init get_pte_virt_late(phys_addr_t pa)
+static inline pte_t *__meminit get_pte_virt_late(phys_addr_t pa)
{
return (pte_t *) __va(pa);
}
@@ -376,7 +376,7 @@ static inline phys_addr_t __init alloc_pte_fixmap(uintptr_t va)
return memblock_phys_alloc(PAGE_SIZE, PAGE_SIZE);
}
-static phys_addr_t __init alloc_pte_late(uintptr_t va)
+static phys_addr_t __meminit alloc_pte_late(uintptr_t va)
{
struct ptdesc *ptdesc = pagetable_alloc(GFP_KERNEL & ~__GFP_HIGHMEM, 0);
@@ -384,9 +384,8 @@ static phys_addr_t __init alloc_pte_late(uintptr_t va)
return __pa((pte_t *)ptdesc_address(ptdesc));
}
-static void __init create_pte_mapping(pte_t *ptep,
- uintptr_t va, phys_addr_t pa,
- phys_addr_t sz, pgprot_t prot)
+static void __meminit create_pte_mapping(pte_t *ptep, uintptr_t va, phys_addr_t pa, phys_addr_t sz,
+ pgprot_t prot)
{
uintptr_t pte_idx = pte_index(va);
@@ -440,7 +439,7 @@ static pmd_t *__init get_pmd_virt_fixmap(phys_addr_t pa)
return (pmd_t *)set_fixmap_offset(FIX_PMD, pa);
}
-static pmd_t *__init get_pmd_virt_late(phys_addr_t pa)
+static pmd_t *__meminit get_pmd_virt_late(phys_addr_t pa)
{
return (pmd_t *) __va(pa);
}
@@ -457,7 +456,7 @@ static phys_addr_t __init alloc_pmd_fixmap(uintptr_t va)
return memblock_phys_alloc(PAGE_SIZE, PAGE_SIZE);
}
-static phys_addr_t __init alloc_pmd_late(uintptr_t va)
+static phys_addr_t __meminit alloc_pmd_late(uintptr_t va)
{
struct ptdesc *ptdesc = pagetable_alloc(GFP_KERNEL & ~__GFP_HIGHMEM, 0);
@@ -465,9 +464,9 @@ static phys_addr_t __init alloc_pmd_late(uintptr_t va)
return __pa((pmd_t *)ptdesc_address(ptdesc));
}
-static void __init create_pmd_mapping(pmd_t *pmdp,
- uintptr_t va, phys_addr_t pa,
- phys_addr_t sz, pgprot_t prot)
+static void __meminit create_pmd_mapping(pmd_t *pmdp,
+ uintptr_t va, phys_addr_t pa,
+ phys_addr_t sz, pgprot_t prot)
{
pte_t *ptep;
phys_addr_t pte_phys;
@@ -503,7 +502,7 @@ static pud_t *__init get_pud_virt_fixmap(phys_addr_t pa)
return (pud_t *)set_fixmap_offset(FIX_PUD, pa);
}
-static pud_t *__init get_pud_virt_late(phys_addr_t pa)
+static pud_t *__meminit get_pud_virt_late(phys_addr_t pa)
{
return (pud_t *)__va(pa);
}
@@ -521,7 +520,7 @@ static phys_addr_t __init alloc_pud_fixmap(uintptr_t va)
return memblock_phys_alloc(PAGE_SIZE, PAGE_SIZE);
}
-static phys_addr_t alloc_pud_late(uintptr_t va)
+static phys_addr_t __meminit alloc_pud_late(uintptr_t va)
{
unsigned long vaddr;
@@ -541,7 +540,7 @@ static p4d_t *__init get_p4d_virt_fixmap(phys_addr_t pa)
return (p4d_t *)set_fixmap_offset(FIX_P4D, pa);
}
-static p4d_t *__init get_p4d_virt_late(phys_addr_t pa)
+static p4d_t *__meminit get_p4d_virt_late(phys_addr_t pa)
{
return (p4d_t *)__va(pa);
}
@@ -559,7 +558,7 @@ static phys_addr_t __init alloc_p4d_fixmap(uintptr_t va)
return memblock_phys_alloc(PAGE_SIZE, PAGE_SIZE);
}
-static phys_addr_t alloc_p4d_late(uintptr_t va)
+static phys_addr_t __meminit alloc_p4d_late(uintptr_t va)
{
unsigned long vaddr;
@@ -568,9 +567,8 @@ static phys_addr_t alloc_p4d_late(uintptr_t va)
return __pa(vaddr);
}
-static void __init create_pud_mapping(pud_t *pudp,
- uintptr_t va, phys_addr_t pa,
- phys_addr_t sz, pgprot_t prot)
+static void __meminit create_pud_mapping(pud_t *pudp, uintptr_t va, phys_addr_t pa, phys_addr_t sz,
+ pgprot_t prot)
{
pmd_t *nextp;
phys_addr_t next_phys;
@@ -595,9 +593,8 @@ static void __init create_pud_mapping(pud_t *pudp,
create_pmd_mapping(nextp, va, pa, sz, prot);
}
-static void __init create_p4d_mapping(p4d_t *p4dp,
- uintptr_t va, phys_addr_t pa,
- phys_addr_t sz, pgprot_t prot)
+static void __meminit create_p4d_mapping(p4d_t *p4dp, uintptr_t va, phys_addr_t pa, phys_addr_t sz,
+ pgprot_t prot)
{
pud_t *nextp;
phys_addr_t next_phys;
@@ -653,9 +650,8 @@ static void __init create_p4d_mapping(p4d_t *p4dp,
#define create_pmd_mapping(__pmdp, __va, __pa, __sz, __prot) do {} while(0)
#endif /* __PAGETABLE_PMD_FOLDED */
-void __init create_pgd_mapping(pgd_t *pgdp,
- uintptr_t va, phys_addr_t pa,
- phys_addr_t sz, pgprot_t prot)
+void __meminit create_pgd_mapping(pgd_t *pgdp, uintptr_t va, phys_addr_t pa, phys_addr_t sz,
+ pgprot_t prot)
{
pgd_next_t *nextp;
phys_addr_t next_phys;
@@ -680,8 +676,7 @@ void __init create_pgd_mapping(pgd_t *pgdp,
create_pgd_next_mapping(nextp, va, pa, sz, prot);
}
-static uintptr_t __init best_map_size(phys_addr_t pa, uintptr_t va,
- phys_addr_t size)
+static uintptr_t __meminit best_map_size(phys_addr_t pa, uintptr_t va, phys_addr_t size)
{
if (pgtable_l5_enabled &&
!(pa & (P4D_SIZE - 1)) && !(va & (P4D_SIZE - 1)) && size >= P4D_SIZE)
@@ -714,7 +709,7 @@ asmlinkage void __init __copy_data(void)
#endif
#ifdef CONFIG_STRICT_KERNEL_RWX
-static __init pgprot_t pgprot_from_va(uintptr_t va)
+static __meminit pgprot_t pgprot_from_va(uintptr_t va)
{
if (is_va_kernel_text(va))
return PAGE_KERNEL_READ_EXEC;
@@ -739,7 +734,7 @@ void mark_rodata_ro(void)
set_memory_ro);
}
#else
-static __init pgprot_t pgprot_from_va(uintptr_t va)
+static __meminit pgprot_t pgprot_from_va(uintptr_t va)
{
if (IS_ENABLED(CONFIG_64BIT) && !is_kernel_mapping(va))
return PAGE_KERNEL;
@@ -1231,9 +1226,8 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
pt_ops_set_fixmap();
}
-static void __init create_linear_mapping_range(phys_addr_t start,
- phys_addr_t end,
- uintptr_t fixed_map_size)
+static void __meminit create_linear_mapping_range(phys_addr_t start, phys_addr_t end,
+ uintptr_t fixed_map_size)
{
phys_addr_t pa;
uintptr_t va, map_size;
--
2.40.1
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH v3 3/9] riscv: mm: Change attribute from __init to __meminit for page functions
2024-05-21 11:48 ` [PATCH v3 3/9] riscv: mm: Change attribute from __init to __meminit for page functions Björn Töpel
@ 2024-05-21 12:35 ` Alexandre Ghiti
0 siblings, 0 replies; 19+ messages in thread
From: Alexandre Ghiti @ 2024-05-21 12:35 UTC (permalink / raw)
To: Björn Töpel
Cc: Albert Ou, David Hildenbrand, Palmer Dabbelt, Paul Walmsley,
linux-riscv, Oscar Salvador, Björn Töpel,
Andrew Bresticker, Chethan Seshadri, Lorenzo Stoakes,
Santosh Mamila, Sivakumar Munnangi, Sunil V L, linux-kernel,
linux-mm, virtualization
On Tue, May 21, 2024 at 1:48 PM Björn Töpel <bjorn@kernel.org> wrote:
>
> From: Björn Töpel <bjorn@rivosinc.com>
>
> Prepare for memory hotplugging support by changing from __init to
> __meminit for the page table functions that are used by the upcoming
> architecture specific callbacks.
>
> Changing the __init attribute to __meminit, avoids that the functions
> are removed after init. The __meminit attribute makes sure the
> functions are kept in the kernel text post init, but only if memory
> hotplugging is enabled for the build.
>
> Reviewed-by: David Hildenbrand <david@redhat.com>
> Reviewed-by: Oscar Salvador <osalvador@suse.de>
> Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
> ---
> arch/riscv/include/asm/mmu.h | 4 +--
> arch/riscv/include/asm/pgtable.h | 2 +-
> arch/riscv/mm/init.c | 56 ++++++++++++++------------------
> 3 files changed, 28 insertions(+), 34 deletions(-)
>
> diff --git a/arch/riscv/include/asm/mmu.h b/arch/riscv/include/asm/mmu.h
> index 947fd60f9051..c9e03e9da3dc 100644
> --- a/arch/riscv/include/asm/mmu.h
> +++ b/arch/riscv/include/asm/mmu.h
> @@ -31,8 +31,8 @@ typedef struct {
> #define cntx2asid(cntx) ((cntx) & SATP_ASID_MASK)
> #define cntx2version(cntx) ((cntx) & ~SATP_ASID_MASK)
>
> -void __init create_pgd_mapping(pgd_t *pgdp, uintptr_t va, phys_addr_t pa,
> - phys_addr_t sz, pgprot_t prot);
> +void __meminit create_pgd_mapping(pgd_t *pgdp, uintptr_t va, phys_addr_t pa, phys_addr_t sz,
> + pgprot_t prot);
> #endif /* __ASSEMBLY__ */
>
> #endif /* _ASM_RISCV_MMU_H */
> diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
> index 58fd7b70b903..7933f493db71 100644
> --- a/arch/riscv/include/asm/pgtable.h
> +++ b/arch/riscv/include/asm/pgtable.h
> @@ -162,7 +162,7 @@ struct pt_alloc_ops {
> #endif
> };
>
> -extern struct pt_alloc_ops pt_ops __initdata;
> +extern struct pt_alloc_ops pt_ops __meminitdata;
>
> #ifdef CONFIG_MMU
> /* Number of PGD entries that a user-mode program can use */
> diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
> index c98010ede810..c969427eab88 100644
> --- a/arch/riscv/mm/init.c
> +++ b/arch/riscv/mm/init.c
> @@ -295,7 +295,7 @@ static void __init setup_bootmem(void)
> }
>
> #ifdef CONFIG_MMU
> -struct pt_alloc_ops pt_ops __initdata;
> +struct pt_alloc_ops pt_ops __meminitdata;
>
> pgd_t swapper_pg_dir[PTRS_PER_PGD] __page_aligned_bss;
> pgd_t trampoline_pg_dir[PTRS_PER_PGD] __page_aligned_bss;
> @@ -357,7 +357,7 @@ static inline pte_t *__init get_pte_virt_fixmap(phys_addr_t pa)
> return (pte_t *)set_fixmap_offset(FIX_PTE, pa);
> }
>
> -static inline pte_t *__init get_pte_virt_late(phys_addr_t pa)
> +static inline pte_t *__meminit get_pte_virt_late(phys_addr_t pa)
> {
> return (pte_t *) __va(pa);
> }
> @@ -376,7 +376,7 @@ static inline phys_addr_t __init alloc_pte_fixmap(uintptr_t va)
> return memblock_phys_alloc(PAGE_SIZE, PAGE_SIZE);
> }
>
> -static phys_addr_t __init alloc_pte_late(uintptr_t va)
> +static phys_addr_t __meminit alloc_pte_late(uintptr_t va)
> {
> struct ptdesc *ptdesc = pagetable_alloc(GFP_KERNEL & ~__GFP_HIGHMEM, 0);
>
> @@ -384,9 +384,8 @@ static phys_addr_t __init alloc_pte_late(uintptr_t va)
> return __pa((pte_t *)ptdesc_address(ptdesc));
> }
>
> -static void __init create_pte_mapping(pte_t *ptep,
> - uintptr_t va, phys_addr_t pa,
> - phys_addr_t sz, pgprot_t prot)
> +static void __meminit create_pte_mapping(pte_t *ptep, uintptr_t va, phys_addr_t pa, phys_addr_t sz,
> + pgprot_t prot)
> {
> uintptr_t pte_idx = pte_index(va);
>
> @@ -440,7 +439,7 @@ static pmd_t *__init get_pmd_virt_fixmap(phys_addr_t pa)
> return (pmd_t *)set_fixmap_offset(FIX_PMD, pa);
> }
>
> -static pmd_t *__init get_pmd_virt_late(phys_addr_t pa)
> +static pmd_t *__meminit get_pmd_virt_late(phys_addr_t pa)
> {
> return (pmd_t *) __va(pa);
> }
> @@ -457,7 +456,7 @@ static phys_addr_t __init alloc_pmd_fixmap(uintptr_t va)
> return memblock_phys_alloc(PAGE_SIZE, PAGE_SIZE);
> }
>
> -static phys_addr_t __init alloc_pmd_late(uintptr_t va)
> +static phys_addr_t __meminit alloc_pmd_late(uintptr_t va)
> {
> struct ptdesc *ptdesc = pagetable_alloc(GFP_KERNEL & ~__GFP_HIGHMEM, 0);
>
> @@ -465,9 +464,9 @@ static phys_addr_t __init alloc_pmd_late(uintptr_t va)
> return __pa((pmd_t *)ptdesc_address(ptdesc));
> }
>
> -static void __init create_pmd_mapping(pmd_t *pmdp,
> - uintptr_t va, phys_addr_t pa,
> - phys_addr_t sz, pgprot_t prot)
> +static void __meminit create_pmd_mapping(pmd_t *pmdp,
> + uintptr_t va, phys_addr_t pa,
> + phys_addr_t sz, pgprot_t prot)
> {
> pte_t *ptep;
> phys_addr_t pte_phys;
> @@ -503,7 +502,7 @@ static pud_t *__init get_pud_virt_fixmap(phys_addr_t pa)
> return (pud_t *)set_fixmap_offset(FIX_PUD, pa);
> }
>
> -static pud_t *__init get_pud_virt_late(phys_addr_t pa)
> +static pud_t *__meminit get_pud_virt_late(phys_addr_t pa)
> {
> return (pud_t *)__va(pa);
> }
> @@ -521,7 +520,7 @@ static phys_addr_t __init alloc_pud_fixmap(uintptr_t va)
> return memblock_phys_alloc(PAGE_SIZE, PAGE_SIZE);
> }
>
> -static phys_addr_t alloc_pud_late(uintptr_t va)
> +static phys_addr_t __meminit alloc_pud_late(uintptr_t va)
> {
> unsigned long vaddr;
>
> @@ -541,7 +540,7 @@ static p4d_t *__init get_p4d_virt_fixmap(phys_addr_t pa)
> return (p4d_t *)set_fixmap_offset(FIX_P4D, pa);
> }
>
> -static p4d_t *__init get_p4d_virt_late(phys_addr_t pa)
> +static p4d_t *__meminit get_p4d_virt_late(phys_addr_t pa)
> {
> return (p4d_t *)__va(pa);
> }
> @@ -559,7 +558,7 @@ static phys_addr_t __init alloc_p4d_fixmap(uintptr_t va)
> return memblock_phys_alloc(PAGE_SIZE, PAGE_SIZE);
> }
>
> -static phys_addr_t alloc_p4d_late(uintptr_t va)
> +static phys_addr_t __meminit alloc_p4d_late(uintptr_t va)
> {
> unsigned long vaddr;
>
> @@ -568,9 +567,8 @@ static phys_addr_t alloc_p4d_late(uintptr_t va)
> return __pa(vaddr);
> }
>
> -static void __init create_pud_mapping(pud_t *pudp,
> - uintptr_t va, phys_addr_t pa,
> - phys_addr_t sz, pgprot_t prot)
> +static void __meminit create_pud_mapping(pud_t *pudp, uintptr_t va, phys_addr_t pa, phys_addr_t sz,
> + pgprot_t prot)
> {
> pmd_t *nextp;
> phys_addr_t next_phys;
> @@ -595,9 +593,8 @@ static void __init create_pud_mapping(pud_t *pudp,
> create_pmd_mapping(nextp, va, pa, sz, prot);
> }
>
> -static void __init create_p4d_mapping(p4d_t *p4dp,
> - uintptr_t va, phys_addr_t pa,
> - phys_addr_t sz, pgprot_t prot)
> +static void __meminit create_p4d_mapping(p4d_t *p4dp, uintptr_t va, phys_addr_t pa, phys_addr_t sz,
> + pgprot_t prot)
> {
> pud_t *nextp;
> phys_addr_t next_phys;
> @@ -653,9 +650,8 @@ static void __init create_p4d_mapping(p4d_t *p4dp,
> #define create_pmd_mapping(__pmdp, __va, __pa, __sz, __prot) do {} while(0)
> #endif /* __PAGETABLE_PMD_FOLDED */
>
> -void __init create_pgd_mapping(pgd_t *pgdp,
> - uintptr_t va, phys_addr_t pa,
> - phys_addr_t sz, pgprot_t prot)
> +void __meminit create_pgd_mapping(pgd_t *pgdp, uintptr_t va, phys_addr_t pa, phys_addr_t sz,
> + pgprot_t prot)
> {
> pgd_next_t *nextp;
> phys_addr_t next_phys;
> @@ -680,8 +676,7 @@ void __init create_pgd_mapping(pgd_t *pgdp,
> create_pgd_next_mapping(nextp, va, pa, sz, prot);
> }
>
> -static uintptr_t __init best_map_size(phys_addr_t pa, uintptr_t va,
> - phys_addr_t size)
> +static uintptr_t __meminit best_map_size(phys_addr_t pa, uintptr_t va, phys_addr_t size)
> {
> if (pgtable_l5_enabled &&
> !(pa & (P4D_SIZE - 1)) && !(va & (P4D_SIZE - 1)) && size >= P4D_SIZE)
> @@ -714,7 +709,7 @@ asmlinkage void __init __copy_data(void)
> #endif
>
> #ifdef CONFIG_STRICT_KERNEL_RWX
> -static __init pgprot_t pgprot_from_va(uintptr_t va)
> +static __meminit pgprot_t pgprot_from_va(uintptr_t va)
> {
> if (is_va_kernel_text(va))
> return PAGE_KERNEL_READ_EXEC;
> @@ -739,7 +734,7 @@ void mark_rodata_ro(void)
> set_memory_ro);
> }
> #else
> -static __init pgprot_t pgprot_from_va(uintptr_t va)
> +static __meminit pgprot_t pgprot_from_va(uintptr_t va)
> {
> if (IS_ENABLED(CONFIG_64BIT) && !is_kernel_mapping(va))
> return PAGE_KERNEL;
> @@ -1231,9 +1226,8 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
> pt_ops_set_fixmap();
> }
>
> -static void __init create_linear_mapping_range(phys_addr_t start,
> - phys_addr_t end,
> - uintptr_t fixed_map_size)
> +static void __meminit create_linear_mapping_range(phys_addr_t start, phys_addr_t end,
> + uintptr_t fixed_map_size)
> {
> phys_addr_t pa;
> uintptr_t va, map_size;
> --
> 2.40.1
>
You can add:
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Thanks,
Alex
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH v3 4/9] riscv: mm: Refactor create_linear_mapping_range() for memory hot add
2024-05-21 11:48 [PATCH v3 0/9] riscv: Memory Hot(Un)Plug support Björn Töpel
` (2 preceding siblings ...)
2024-05-21 11:48 ` [PATCH v3 3/9] riscv: mm: Change attribute from __init to __meminit for page functions Björn Töpel
@ 2024-05-21 11:48 ` Björn Töpel
2024-05-21 11:48 ` [PATCH v3 5/9] riscv: mm: Add memory hotplugging support Björn Töpel
` (4 subsequent siblings)
8 siblings, 0 replies; 19+ messages in thread
From: Björn Töpel @ 2024-05-21 11:48 UTC (permalink / raw)
To: Alexandre Ghiti, Albert Ou, David Hildenbrand, Palmer Dabbelt,
Paul Walmsley, linux-riscv, Oscar Salvador
Cc: Björn Töpel, Andrew Bresticker, Chethan Seshadri,
Lorenzo Stoakes, Santosh Mamila, Sivakumar Munnangi, Sunil V L,
linux-kernel, linux-mm, virtualization
From: Björn Töpel <bjorn@rivosinc.com>
Add a parameter to the direct map setup function, so it can be used in
arch_add_memory() later.
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
---
arch/riscv/mm/init.c | 15 ++++++---------
1 file changed, 6 insertions(+), 9 deletions(-)
diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index c969427eab88..6f72b0b2b854 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -1227,7 +1227,7 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
}
static void __meminit create_linear_mapping_range(phys_addr_t start, phys_addr_t end,
- uintptr_t fixed_map_size)
+ uintptr_t fixed_map_size, const pgprot_t *pgprot)
{
phys_addr_t pa;
uintptr_t va, map_size;
@@ -1238,7 +1238,7 @@ static void __meminit create_linear_mapping_range(phys_addr_t start, phys_addr_t
best_map_size(pa, va, end - pa);
create_pgd_mapping(swapper_pg_dir, va, pa, map_size,
- pgprot_from_va(va));
+ pgprot ? *pgprot : pgprot_from_va(va));
}
}
@@ -1282,22 +1282,19 @@ static void __init create_linear_mapping_page_table(void)
if (end >= __pa(PAGE_OFFSET) + memory_limit)
end = __pa(PAGE_OFFSET) + memory_limit;
- create_linear_mapping_range(start, end, 0);
+ create_linear_mapping_range(start, end, 0, NULL);
}
#ifdef CONFIG_STRICT_KERNEL_RWX
- create_linear_mapping_range(ktext_start, ktext_start + ktext_size, 0);
- create_linear_mapping_range(krodata_start,
- krodata_start + krodata_size, 0);
+ create_linear_mapping_range(ktext_start, ktext_start + ktext_size, 0, NULL);
+ create_linear_mapping_range(krodata_start, krodata_start + krodata_size, 0, NULL);
memblock_clear_nomap(ktext_start, ktext_size);
memblock_clear_nomap(krodata_start, krodata_size);
#endif
#ifdef CONFIG_KFENCE
- create_linear_mapping_range(kfence_pool,
- kfence_pool + KFENCE_POOL_SIZE,
- PAGE_SIZE);
+ create_linear_mapping_range(kfence_pool, kfence_pool + KFENCE_POOL_SIZE, PAGE_SIZE, NULL);
memblock_clear_nomap(kfence_pool, KFENCE_POOL_SIZE);
#endif
--
2.40.1
^ permalink raw reply [flat|nested] 19+ messages in thread* [PATCH v3 5/9] riscv: mm: Add memory hotplugging support
2024-05-21 11:48 [PATCH v3 0/9] riscv: Memory Hot(Un)Plug support Björn Töpel
` (3 preceding siblings ...)
2024-05-21 11:48 ` [PATCH v3 4/9] riscv: mm: Refactor create_linear_mapping_range() for memory hot add Björn Töpel
@ 2024-05-21 11:48 ` Björn Töpel
2024-05-21 13:19 ` Alexandre Ghiti
2024-05-21 11:48 ` [PATCH v3 6/9] riscv: mm: Take memory hotplug read-lock during kernel page table dump Björn Töpel
` (3 subsequent siblings)
8 siblings, 1 reply; 19+ messages in thread
From: Björn Töpel @ 2024-05-21 11:48 UTC (permalink / raw)
To: Alexandre Ghiti, Albert Ou, David Hildenbrand, Palmer Dabbelt,
Paul Walmsley, linux-riscv, Oscar Salvador
Cc: Björn Töpel, Andrew Bresticker, Chethan Seshadri,
Lorenzo Stoakes, Santosh Mamila, Sivakumar Munnangi, Sunil V L,
linux-kernel, linux-mm, virtualization
From: Björn Töpel <bjorn@rivosinc.com>
For an architecture to support memory hotplugging, a couple of
callbacks needs to be implemented:
arch_add_memory()
This callback is responsible for adding the physical memory into the
direct map, and call into the memory hotplugging generic code via
__add_pages() that adds the corresponding struct page entries, and
updates the vmemmap mapping.
arch_remove_memory()
This is the inverse of the callback above.
vmemmap_free()
This function tears down the vmemmap mappings (if
CONFIG_SPARSEMEM_VMEMMAP is enabled), and also deallocates the
backing vmemmap pages. Note that for persistent memory, an
alternative allocator for the backing pages can be used; The
vmem_altmap. This means that when the backing pages are cleared,
extra care is needed so that the correct deallocation method is
used.
arch_get_mappable_range()
This functions returns the PA range that the direct map can map.
Used by the MHP internals for sanity checks.
The page table unmap/teardown functions are heavily based on code from
the x86 tree. The same remove_pgd_mapping() function is used in both
vmemmap_free() and arch_remove_memory(), but in the latter function
the backing pages are not removed.
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
---
arch/riscv/mm/init.c | 261 +++++++++++++++++++++++++++++++++++++++++++
1 file changed, 261 insertions(+)
diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index 6f72b0b2b854..6693b742bf2f 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -1493,3 +1493,264 @@ void __init pgtable_cache_init(void)
}
}
#endif
+
+#ifdef CONFIG_MEMORY_HOTPLUG
+static void __meminit free_pagetable(struct page *page, int order)
+{
+ unsigned int nr_pages = 1 << order;
+
+ /*
+ * vmemmap/direct page tables can be reserved, if added at
+ * boot.
+ */
+ if (PageReserved(page)) {
+ __ClearPageReserved(page);
+ while (nr_pages--)
+ free_reserved_page(page++);
+ return;
+ }
+
+ free_pages((unsigned long)page_address(page), order);
+}
+
+static void __meminit free_pte_table(pte_t *pte_start, pmd_t *pmd)
+{
+ pte_t *pte;
+ int i;
+
+ for (i = 0; i < PTRS_PER_PTE; i++) {
+ pte = pte_start + i;
+ if (!pte_none(*pte))
+ return;
+ }
+
+ free_pagetable(pmd_page(*pmd), 0);
+ pmd_clear(pmd);
+}
+
+static void __meminit free_pmd_table(pmd_t *pmd_start, pud_t *pud)
+{
+ pmd_t *pmd;
+ int i;
+
+ for (i = 0; i < PTRS_PER_PMD; i++) {
+ pmd = pmd_start + i;
+ if (!pmd_none(*pmd))
+ return;
+ }
+
+ free_pagetable(pud_page(*pud), 0);
+ pud_clear(pud);
+}
+
+static void __meminit free_pud_table(pud_t *pud_start, p4d_t *p4d)
+{
+ pud_t *pud;
+ int i;
+
+ for (i = 0; i < PTRS_PER_PUD; i++) {
+ pud = pud_start + i;
+ if (!pud_none(*pud))
+ return;
+ }
+
+ free_pagetable(p4d_page(*p4d), 0);
+ p4d_clear(p4d);
+}
+
+static void __meminit free_vmemmap_storage(struct page *page, size_t size,
+ struct vmem_altmap *altmap)
+{
+ if (altmap)
+ vmem_altmap_free(altmap, size >> PAGE_SHIFT);
+ else
+ free_pagetable(page, get_order(size));
+}
+
+static void __meminit remove_pte_mapping(pte_t *pte_base, unsigned long addr, unsigned long end,
+ bool is_vmemmap, struct vmem_altmap *altmap)
+{
+ unsigned long next;
+ pte_t *ptep, pte;
+
+ for (; addr < end; addr = next) {
+ next = (addr + PAGE_SIZE) & PAGE_MASK;
+ if (next > end)
+ next = end;
+
+ ptep = pte_base + pte_index(addr);
+ pte = READ_ONCE(*ptep);
+
+ if (!pte_present(*ptep))
+ continue;
+
+ pte_clear(&init_mm, addr, ptep);
+ if (is_vmemmap)
+ free_vmemmap_storage(pte_page(pte), PAGE_SIZE, altmap);
+ }
+}
+
+static void __meminit remove_pmd_mapping(pmd_t *pmd_base, unsigned long addr, unsigned long end,
+ bool is_vmemmap, struct vmem_altmap *altmap)
+{
+ unsigned long next;
+ pte_t *pte_base;
+ pmd_t *pmdp, pmd;
+
+ for (; addr < end; addr = next) {
+ next = pmd_addr_end(addr, end);
+ pmdp = pmd_base + pmd_index(addr);
+ pmd = READ_ONCE(*pmdp);
+
+ if (!pmd_present(pmd))
+ continue;
+
+ if (pmd_leaf(pmd)) {
+ pmd_clear(pmdp);
+ if (is_vmemmap)
+ free_vmemmap_storage(pmd_page(pmd), PMD_SIZE, altmap);
+ continue;
+ }
+
+ pte_base = (pte_t *)pmd_page_vaddr(*pmdp);
+ remove_pte_mapping(pte_base, addr, next, is_vmemmap, altmap);
+ free_pte_table(pte_base, pmdp);
+ }
+}
+
+static void __meminit remove_pud_mapping(pud_t *pud_base, unsigned long addr, unsigned long end,
+ bool is_vmemmap, struct vmem_altmap *altmap)
+{
+ unsigned long next;
+ pud_t *pudp, pud;
+ pmd_t *pmd_base;
+
+ for (; addr < end; addr = next) {
+ next = pud_addr_end(addr, end);
+ pudp = pud_base + pud_index(addr);
+ pud = READ_ONCE(*pudp);
+
+ if (!pud_present(pud))
+ continue;
+
+ if (pud_leaf(pud)) {
+ if (pgtable_l4_enabled) {
+ pud_clear(pudp);
+ if (is_vmemmap)
+ free_vmemmap_storage(pud_page(pud), PUD_SIZE, altmap);
+ }
+ continue;
+ }
+
+ pmd_base = pmd_offset(pudp, 0);
+ remove_pmd_mapping(pmd_base, addr, next, is_vmemmap, altmap);
+
+ if (pgtable_l4_enabled)
+ free_pmd_table(pmd_base, pudp);
+ }
+}
+
+static void __meminit remove_p4d_mapping(p4d_t *p4d_base, unsigned long addr, unsigned long end,
+ bool is_vmemmap, struct vmem_altmap *altmap)
+{
+ unsigned long next;
+ p4d_t *p4dp, p4d;
+ pud_t *pud_base;
+
+ for (; addr < end; addr = next) {
+ next = p4d_addr_end(addr, end);
+ p4dp = p4d_base + p4d_index(addr);
+ p4d = READ_ONCE(*p4dp);
+
+ if (!p4d_present(p4d))
+ continue;
+
+ if (p4d_leaf(p4d)) {
+ if (pgtable_l5_enabled) {
+ p4d_clear(p4dp);
+ if (is_vmemmap)
+ free_vmemmap_storage(p4d_page(p4d), P4D_SIZE, altmap);
+ }
+ continue;
+ }
+
+ pud_base = pud_offset(p4dp, 0);
+ remove_pud_mapping(pud_base, addr, next, is_vmemmap, altmap);
+
+ if (pgtable_l5_enabled)
+ free_pud_table(pud_base, p4dp);
+ }
+}
+
+static void __meminit remove_pgd_mapping(unsigned long va, unsigned long end, bool is_vmemmap,
+ struct vmem_altmap *altmap)
+{
+ unsigned long addr, next;
+ p4d_t *p4d_base;
+ pgd_t *pgd;
+
+ for (addr = va; addr < end; addr = next) {
+ next = pgd_addr_end(addr, end);
+ pgd = pgd_offset_k(addr);
+
+ if (!pgd_present(*pgd))
+ continue;
+
+ if (pgd_leaf(*pgd))
+ continue;
+
+ p4d_base = p4d_offset(pgd, 0);
+ remove_p4d_mapping(p4d_base, addr, next, is_vmemmap, altmap);
+ }
+
+ flush_tlb_all();
+}
+
+static void __meminit remove_linear_mapping(phys_addr_t start, u64 size)
+{
+ unsigned long va = (unsigned long)__va(start);
+ unsigned long end = (unsigned long)__va(start + size);
+
+ remove_pgd_mapping(va, end, false, NULL);
+}
+
+struct range arch_get_mappable_range(void)
+{
+ struct range mhp_range;
+
+ mhp_range.start = __pa(PAGE_OFFSET);
+ mhp_range.end = __pa(PAGE_END - 1);
+ return mhp_range;
+}
+
+int __ref arch_add_memory(int nid, u64 start, u64 size, struct mhp_params *params)
+{
+ int ret = 0;
+
+ create_linear_mapping_range(start, start + size, 0, ¶ms->pgprot);
+ ret = __add_pages(nid, start >> PAGE_SHIFT, size >> PAGE_SHIFT, params);
+ if (ret) {
+ remove_linear_mapping(start, size);
+ goto out;
+ }
+
+ max_pfn = PFN_UP(start + size);
+ max_low_pfn = max_pfn;
+
+ out:
+ flush_tlb_all();
+ return ret;
+}
+
+void __ref arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap)
+{
+ __remove_pages(start >> PAGE_SHIFT, size >> PAGE_SHIFT, altmap);
+ remove_linear_mapping(start, size);
+ flush_tlb_all();
+}
+
+void __ref vmemmap_free(unsigned long start, unsigned long end, struct vmem_altmap *altmap)
+{
+ remove_pgd_mapping(start, end, true, altmap);
+}
+#endif /* CONFIG_MEMORY_HOTPLUG */
--
2.40.1
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH v3 5/9] riscv: mm: Add memory hotplugging support
2024-05-21 11:48 ` [PATCH v3 5/9] riscv: mm: Add memory hotplugging support Björn Töpel
@ 2024-05-21 13:19 ` Alexandre Ghiti
2024-05-21 14:18 ` Björn Töpel
2024-05-21 14:20 ` Oscar Salvador
0 siblings, 2 replies; 19+ messages in thread
From: Alexandre Ghiti @ 2024-05-21 13:19 UTC (permalink / raw)
To: Björn Töpel
Cc: Albert Ou, David Hildenbrand, Palmer Dabbelt, Paul Walmsley,
linux-riscv, Oscar Salvador, Björn Töpel,
Andrew Bresticker, Chethan Seshadri, Lorenzo Stoakes,
Santosh Mamila, Sivakumar Munnangi, Sunil V L, linux-kernel,
linux-mm, virtualization
On Tue, May 21, 2024 at 1:49 PM Björn Töpel <bjorn@kernel.org> wrote:
>
> From: Björn Töpel <bjorn@rivosinc.com>
>
> For an architecture to support memory hotplugging, a couple of
> callbacks needs to be implemented:
>
> arch_add_memory()
> This callback is responsible for adding the physical memory into the
> direct map, and call into the memory hotplugging generic code via
> __add_pages() that adds the corresponding struct page entries, and
> updates the vmemmap mapping.
>
> arch_remove_memory()
> This is the inverse of the callback above.
>
> vmemmap_free()
> This function tears down the vmemmap mappings (if
> CONFIG_SPARSEMEM_VMEMMAP is enabled), and also deallocates the
> backing vmemmap pages. Note that for persistent memory, an
> alternative allocator for the backing pages can be used; The
> vmem_altmap. This means that when the backing pages are cleared,
> extra care is needed so that the correct deallocation method is
> used.
>
> arch_get_mappable_range()
> This functions returns the PA range that the direct map can map.
> Used by the MHP internals for sanity checks.
>
> The page table unmap/teardown functions are heavily based on code from
> the x86 tree. The same remove_pgd_mapping() function is used in both
> vmemmap_free() and arch_remove_memory(), but in the latter function
> the backing pages are not removed.
>
> Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
> ---
> arch/riscv/mm/init.c | 261 +++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 261 insertions(+)
>
> diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
> index 6f72b0b2b854..6693b742bf2f 100644
> --- a/arch/riscv/mm/init.c
> +++ b/arch/riscv/mm/init.c
> @@ -1493,3 +1493,264 @@ void __init pgtable_cache_init(void)
> }
> }
> #endif
> +
> +#ifdef CONFIG_MEMORY_HOTPLUG
> +static void __meminit free_pagetable(struct page *page, int order)
> +{
> + unsigned int nr_pages = 1 << order;
> +
> + /*
> + * vmemmap/direct page tables can be reserved, if added at
> + * boot.
> + */
> + if (PageReserved(page)) {
> + __ClearPageReserved(page);
What's the difference between __ClearPageReserved() and
ClearPageReserved()? Because it seems like free_reserved_page() calls
the latter already, so why would you need to call
__ClearPageReserved() on the first page?
> + while (nr_pages--)
> + free_reserved_page(page++);
> + return;
> + }
> +
> + free_pages((unsigned long)page_address(page), order);
> +}
> +
> +static void __meminit free_pte_table(pte_t *pte_start, pmd_t *pmd)
> +{
> + pte_t *pte;
> + int i;
> +
> + for (i = 0; i < PTRS_PER_PTE; i++) {
> + pte = pte_start + i;
> + if (!pte_none(*pte))
> + return;
> + }
> +
> + free_pagetable(pmd_page(*pmd), 0);
> + pmd_clear(pmd);
> +}
> +
> +static void __meminit free_pmd_table(pmd_t *pmd_start, pud_t *pud)
> +{
> + pmd_t *pmd;
> + int i;
> +
> + for (i = 0; i < PTRS_PER_PMD; i++) {
> + pmd = pmd_start + i;
> + if (!pmd_none(*pmd))
> + return;
> + }
> +
> + free_pagetable(pud_page(*pud), 0);
> + pud_clear(pud);
> +}
> +
> +static void __meminit free_pud_table(pud_t *pud_start, p4d_t *p4d)
> +{
> + pud_t *pud;
> + int i;
> +
> + for (i = 0; i < PTRS_PER_PUD; i++) {
> + pud = pud_start + i;
> + if (!pud_none(*pud))
> + return;
> + }
> +
> + free_pagetable(p4d_page(*p4d), 0);
> + p4d_clear(p4d);
> +}
> +
> +static void __meminit free_vmemmap_storage(struct page *page, size_t size,
> + struct vmem_altmap *altmap)
> +{
> + if (altmap)
> + vmem_altmap_free(altmap, size >> PAGE_SHIFT);
> + else
> + free_pagetable(page, get_order(size));
> +}
> +
> +static void __meminit remove_pte_mapping(pte_t *pte_base, unsigned long addr, unsigned long end,
> + bool is_vmemmap, struct vmem_altmap *altmap)
> +{
> + unsigned long next;
> + pte_t *ptep, pte;
> +
> + for (; addr < end; addr = next) {
> + next = (addr + PAGE_SIZE) & PAGE_MASK;
Nit: use ALIGN() instead.
> + if (next > end)
> + next = end;
> +
> + ptep = pte_base + pte_index(addr);
> + pte = READ_ONCE(*ptep);
Nit: Use ptep_get()
> +
> + if (!pte_present(*ptep))
> + continue;
> +
> + pte_clear(&init_mm, addr, ptep);
> + if (is_vmemmap)
> + free_vmemmap_storage(pte_page(pte), PAGE_SIZE, altmap);
> + }
> +}
> +
> +static void __meminit remove_pmd_mapping(pmd_t *pmd_base, unsigned long addr, unsigned long end,
> + bool is_vmemmap, struct vmem_altmap *altmap)
> +{
> + unsigned long next;
> + pte_t *pte_base;
> + pmd_t *pmdp, pmd;
> +
> + for (; addr < end; addr = next) {
> + next = pmd_addr_end(addr, end);
> + pmdp = pmd_base + pmd_index(addr);
> + pmd = READ_ONCE(*pmdp);
Nit: Use pmdp_get()
> +
> + if (!pmd_present(pmd))
> + continue;
> +
> + if (pmd_leaf(pmd)) {
> + pmd_clear(pmdp);
> + if (is_vmemmap)
> + free_vmemmap_storage(pmd_page(pmd), PMD_SIZE, altmap);
> + continue;
> + }
> +
> + pte_base = (pte_t *)pmd_page_vaddr(*pmdp);
> + remove_pte_mapping(pte_base, addr, next, is_vmemmap, altmap);
> + free_pte_table(pte_base, pmdp);
> + }
> +}
> +
> +static void __meminit remove_pud_mapping(pud_t *pud_base, unsigned long addr, unsigned long end,
> + bool is_vmemmap, struct vmem_altmap *altmap)
> +{
> + unsigned long next;
> + pud_t *pudp, pud;
> + pmd_t *pmd_base;
> +
> + for (; addr < end; addr = next) {
> + next = pud_addr_end(addr, end);
> + pudp = pud_base + pud_index(addr);
> + pud = READ_ONCE(*pudp);
Nit: Use pudp_get()
> +
> + if (!pud_present(pud))
> + continue;
> +
> + if (pud_leaf(pud)) {
> + if (pgtable_l4_enabled) {
> + pud_clear(pudp);
> + if (is_vmemmap)
> + free_vmemmap_storage(pud_page(pud), PUD_SIZE, altmap);
> + }
> + continue;
> + }
> +
> + pmd_base = pmd_offset(pudp, 0);
> + remove_pmd_mapping(pmd_base, addr, next, is_vmemmap, altmap);
> +
> + if (pgtable_l4_enabled)
> + free_pmd_table(pmd_base, pudp);
> + }
> +}
> +
> +static void __meminit remove_p4d_mapping(p4d_t *p4d_base, unsigned long addr, unsigned long end,
> + bool is_vmemmap, struct vmem_altmap *altmap)
> +{
> + unsigned long next;
> + p4d_t *p4dp, p4d;
> + pud_t *pud_base;
> +
> + for (; addr < end; addr = next) {
> + next = p4d_addr_end(addr, end);
> + p4dp = p4d_base + p4d_index(addr);
> + p4d = READ_ONCE(*p4dp);
Nit: Use p4dp_get()
> +
> + if (!p4d_present(p4d))
> + continue;
> +
> + if (p4d_leaf(p4d)) {
> + if (pgtable_l5_enabled) {
> + p4d_clear(p4dp);
> + if (is_vmemmap)
> + free_vmemmap_storage(p4d_page(p4d), P4D_SIZE, altmap);
> + }
> + continue;
> + }
> +
> + pud_base = pud_offset(p4dp, 0);
> + remove_pud_mapping(pud_base, addr, next, is_vmemmap, altmap);
> +
> + if (pgtable_l5_enabled)
> + free_pud_table(pud_base, p4dp);
> + }
> +}
> +
> +static void __meminit remove_pgd_mapping(unsigned long va, unsigned long end, bool is_vmemmap,
> + struct vmem_altmap *altmap)
> +{
> + unsigned long addr, next;
> + p4d_t *p4d_base;
> + pgd_t *pgd;
> +
> + for (addr = va; addr < end; addr = next) {
> + next = pgd_addr_end(addr, end);
> + pgd = pgd_offset_k(addr);
> +
> + if (!pgd_present(*pgd))
> + continue;
> +
> + if (pgd_leaf(*pgd))
> + continue;
> +
> + p4d_base = p4d_offset(pgd, 0);
> + remove_p4d_mapping(p4d_base, addr, next, is_vmemmap, altmap);
> + }
> +
> + flush_tlb_all();
> +}
> +
> +static void __meminit remove_linear_mapping(phys_addr_t start, u64 size)
> +{
> + unsigned long va = (unsigned long)__va(start);
> + unsigned long end = (unsigned long)__va(start + size);
> +
> + remove_pgd_mapping(va, end, false, NULL);
> +}
> +
> +struct range arch_get_mappable_range(void)
> +{
> + struct range mhp_range;
> +
> + mhp_range.start = __pa(PAGE_OFFSET);
> + mhp_range.end = __pa(PAGE_END - 1);
> + return mhp_range;
> +}
> +
> +int __ref arch_add_memory(int nid, u64 start, u64 size, struct mhp_params *params)
> +{
> + int ret = 0;
> +
> + create_linear_mapping_range(start, start + size, 0, ¶ms->pgprot);
> + ret = __add_pages(nid, start >> PAGE_SHIFT, size >> PAGE_SHIFT, params);
> + if (ret) {
> + remove_linear_mapping(start, size);
> + goto out;
> + }
> +
> + max_pfn = PFN_UP(start + size);
> + max_low_pfn = max_pfn;
> +
> + out:
> + flush_tlb_all();
> + return ret;
> +}
> +
> +void __ref arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap)
> +{
> + __remove_pages(start >> PAGE_SHIFT, size >> PAGE_SHIFT, altmap);
> + remove_linear_mapping(start, size);
> + flush_tlb_all();
> +}
> +
> +void __ref vmemmap_free(unsigned long start, unsigned long end, struct vmem_altmap *altmap)
> +{
> + remove_pgd_mapping(start, end, true, altmap);
> +}
> +#endif /* CONFIG_MEMORY_HOTPLUG */
> --
> 2.40.1
>
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH v3 5/9] riscv: mm: Add memory hotplugging support
2024-05-21 13:19 ` Alexandre Ghiti
@ 2024-05-21 14:18 ` Björn Töpel
2024-05-21 14:20 ` Oscar Salvador
1 sibling, 0 replies; 19+ messages in thread
From: Björn Töpel @ 2024-05-21 14:18 UTC (permalink / raw)
To: Alexandre Ghiti
Cc: Albert Ou, David Hildenbrand, Palmer Dabbelt, Paul Walmsley,
linux-riscv, Oscar Salvador, Björn Töpel,
Andrew Bresticker, Chethan Seshadri, Lorenzo Stoakes,
Santosh Mamila, Sivakumar Munnangi, Sunil V L, linux-kernel,
linux-mm, virtualization
Alexandre Ghiti <alexghiti@rivosinc.com> writes:
> On Tue, May 21, 2024 at 1:49 PM Björn Töpel <bjorn@kernel.org> wrote:
>>
>> From: Björn Töpel <bjorn@rivosinc.com>
>>
>> For an architecture to support memory hotplugging, a couple of
>> callbacks needs to be implemented:
>>
>> arch_add_memory()
>> This callback is responsible for adding the physical memory into the
>> direct map, and call into the memory hotplugging generic code via
>> __add_pages() that adds the corresponding struct page entries, and
>> updates the vmemmap mapping.
>>
>> arch_remove_memory()
>> This is the inverse of the callback above.
>>
>> vmemmap_free()
>> This function tears down the vmemmap mappings (if
>> CONFIG_SPARSEMEM_VMEMMAP is enabled), and also deallocates the
>> backing vmemmap pages. Note that for persistent memory, an
>> alternative allocator for the backing pages can be used; The
>> vmem_altmap. This means that when the backing pages are cleared,
>> extra care is needed so that the correct deallocation method is
>> used.
>>
>> arch_get_mappable_range()
>> This functions returns the PA range that the direct map can map.
>> Used by the MHP internals for sanity checks.
>>
>> The page table unmap/teardown functions are heavily based on code from
>> the x86 tree. The same remove_pgd_mapping() function is used in both
>> vmemmap_free() and arch_remove_memory(), but in the latter function
>> the backing pages are not removed.
>>
>> Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
>> ---
>> arch/riscv/mm/init.c | 261 +++++++++++++++++++++++++++++++++++++++++++
>> 1 file changed, 261 insertions(+)
>>
>> diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
>> index 6f72b0b2b854..6693b742bf2f 100644
>> --- a/arch/riscv/mm/init.c
>> +++ b/arch/riscv/mm/init.c
>> @@ -1493,3 +1493,264 @@ void __init pgtable_cache_init(void)
>> }
>> }
>> #endif
>> +
>> +#ifdef CONFIG_MEMORY_HOTPLUG
>> +static void __meminit free_pagetable(struct page *page, int order)
>> +{
>> + unsigned int nr_pages = 1 << order;
>> +
>> + /*
>> + * vmemmap/direct page tables can be reserved, if added at
>> + * boot.
>> + */
>> + if (PageReserved(page)) {
>> + __ClearPageReserved(page);
>
> What's the difference between __ClearPageReserved() and
> ClearPageReserved()? Because it seems like free_reserved_page() calls
> the latter already, so why would you need to call
> __ClearPageReserved() on the first page?
Indeed! x86 copy pasta (which uses bootmem info page that RV doesn't).
>> + while (nr_pages--)
>> + free_reserved_page(page++);
>> + return;
>> + }
>> +
>> + free_pages((unsigned long)page_address(page), order);
>> +}
>> +
>> +static void __meminit free_pte_table(pte_t *pte_start, pmd_t *pmd)
>> +{
>> + pte_t *pte;
>> + int i;
>> +
>> + for (i = 0; i < PTRS_PER_PTE; i++) {
>> + pte = pte_start + i;
>> + if (!pte_none(*pte))
>> + return;
>> + }
>> +
>> + free_pagetable(pmd_page(*pmd), 0);
>> + pmd_clear(pmd);
>> +}
>> +
>> +static void __meminit free_pmd_table(pmd_t *pmd_start, pud_t *pud)
>> +{
>> + pmd_t *pmd;
>> + int i;
>> +
>> + for (i = 0; i < PTRS_PER_PMD; i++) {
>> + pmd = pmd_start + i;
>> + if (!pmd_none(*pmd))
>> + return;
>> + }
>> +
>> + free_pagetable(pud_page(*pud), 0);
>> + pud_clear(pud);
>> +}
>> +
>> +static void __meminit free_pud_table(pud_t *pud_start, p4d_t *p4d)
>> +{
>> + pud_t *pud;
>> + int i;
>> +
>> + for (i = 0; i < PTRS_PER_PUD; i++) {
>> + pud = pud_start + i;
>> + if (!pud_none(*pud))
>> + return;
>> + }
>> +
>> + free_pagetable(p4d_page(*p4d), 0);
>> + p4d_clear(p4d);
>> +}
>> +
>> +static void __meminit free_vmemmap_storage(struct page *page, size_t size,
>> + struct vmem_altmap *altmap)
>> +{
>> + if (altmap)
>> + vmem_altmap_free(altmap, size >> PAGE_SHIFT);
>> + else
>> + free_pagetable(page, get_order(size));
>> +}
>> +
>> +static void __meminit remove_pte_mapping(pte_t *pte_base, unsigned long addr, unsigned long end,
>> + bool is_vmemmap, struct vmem_altmap *altmap)
>> +{
>> + unsigned long next;
>> + pte_t *ptep, pte;
>> +
>> + for (; addr < end; addr = next) {
>> + next = (addr + PAGE_SIZE) & PAGE_MASK;
>
> Nit: use ALIGN() instead.
>
>> + if (next > end)
>> + next = end;
>> +
>> + ptep = pte_base + pte_index(addr);
>> + pte = READ_ONCE(*ptep);
>
> Nit: Use ptep_get()
>
>> +
>> + if (!pte_present(*ptep))
>> + continue;
>> +
>> + pte_clear(&init_mm, addr, ptep);
>> + if (is_vmemmap)
>> + free_vmemmap_storage(pte_page(pte), PAGE_SIZE, altmap);
>> + }
>> +}
>> +
>> +static void __meminit remove_pmd_mapping(pmd_t *pmd_base, unsigned long addr, unsigned long end,
>> + bool is_vmemmap, struct vmem_altmap *altmap)
>> +{
>> + unsigned long next;
>> + pte_t *pte_base;
>> + pmd_t *pmdp, pmd;
>> +
>> + for (; addr < end; addr = next) {
>> + next = pmd_addr_end(addr, end);
>> + pmdp = pmd_base + pmd_index(addr);
>> + pmd = READ_ONCE(*pmdp);
>
> Nit: Use pmdp_get()
>
>> +
>> + if (!pmd_present(pmd))
>> + continue;
>> +
>> + if (pmd_leaf(pmd)) {
>> + pmd_clear(pmdp);
>> + if (is_vmemmap)
>> + free_vmemmap_storage(pmd_page(pmd), PMD_SIZE, altmap);
>> + continue;
>> + }
>> +
>> + pte_base = (pte_t *)pmd_page_vaddr(*pmdp);
>> + remove_pte_mapping(pte_base, addr, next, is_vmemmap, altmap);
>> + free_pte_table(pte_base, pmdp);
>> + }
>> +}
>> +
>> +static void __meminit remove_pud_mapping(pud_t *pud_base, unsigned long addr, unsigned long end,
>> + bool is_vmemmap, struct vmem_altmap *altmap)
>> +{
>> + unsigned long next;
>> + pud_t *pudp, pud;
>> + pmd_t *pmd_base;
>> +
>> + for (; addr < end; addr = next) {
>> + next = pud_addr_end(addr, end);
>> + pudp = pud_base + pud_index(addr);
>> + pud = READ_ONCE(*pudp);
>
> Nit: Use pudp_get()
>
>> +
>> + if (!pud_present(pud))
>> + continue;
>> +
>> + if (pud_leaf(pud)) {
>> + if (pgtable_l4_enabled) {
>> + pud_clear(pudp);
>> + if (is_vmemmap)
>> + free_vmemmap_storage(pud_page(pud), PUD_SIZE, altmap);
>> + }
>> + continue;
>> + }
>> +
>> + pmd_base = pmd_offset(pudp, 0);
>> + remove_pmd_mapping(pmd_base, addr, next, is_vmemmap, altmap);
>> +
>> + if (pgtable_l4_enabled)
>> + free_pmd_table(pmd_base, pudp);
>> + }
>> +}
>> +
>> +static void __meminit remove_p4d_mapping(p4d_t *p4d_base, unsigned long addr, unsigned long end,
>> + bool is_vmemmap, struct vmem_altmap *altmap)
>> +{
>> + unsigned long next;
>> + p4d_t *p4dp, p4d;
>> + pud_t *pud_base;
>> +
>> + for (; addr < end; addr = next) {
>> + next = p4d_addr_end(addr, end);
>> + p4dp = p4d_base + p4d_index(addr);
>> + p4d = READ_ONCE(*p4dp);
>
> Nit: Use p4dp_get()
...and I'll make sure to address these nits as well.
Thanks!
Björn
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH v3 5/9] riscv: mm: Add memory hotplugging support
2024-05-21 13:19 ` Alexandre Ghiti
2024-05-21 14:18 ` Björn Töpel
@ 2024-05-21 14:20 ` Oscar Salvador
1 sibling, 0 replies; 19+ messages in thread
From: Oscar Salvador @ 2024-05-21 14:20 UTC (permalink / raw)
To: Alexandre Ghiti
Cc: Björn Töpel, Albert Ou, David Hildenbrand,
Palmer Dabbelt, Paul Walmsley, linux-riscv,
Björn Töpel, Andrew Bresticker, Chethan Seshadri,
Lorenzo Stoakes, Santosh Mamila, Sivakumar Munnangi, Sunil V L,
linux-kernel, linux-mm, virtualization
On Tue, May 21, 2024 at 03:19:37PM +0200, Alexandre Ghiti wrote:
> On Tue, May 21, 2024 at 1:49 PM Björn Töpel <bjorn@kernel.org> wrote:
> > + if (PageReserved(page)) {
> > + __ClearPageReserved(page);
>
> What's the difference between __ClearPageReserved() and
> ClearPageReserved()? Because it seems like free_reserved_page() calls
> the latter already, so why would you need to call
> __ClearPageReserved() on the first page?
__{Set,Clear}Page are the non-atomic version.
Usually used when you know that no one else can fiddle with the page, which
should be the case here since we are removing the memory.
As to why we have __ClearPageReserved and then having
free_reserved_page() call ClearPageReserved I do not really know.
Looking at the history, it has always been like this.
I remember I looked at this a few years ago but I cannot remember the outcome
of that.
Maybe David remembers better, but I think we could remove that
__ClearPageReserved.
Looking at powerpc implementation code, it does not do the
__ClearPageReserved and relies only on free_reserved_page().
I will have a look.
--
Oscar Salvador
SUSE Labs
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH v3 6/9] riscv: mm: Take memory hotplug read-lock during kernel page table dump
2024-05-21 11:48 [PATCH v3 0/9] riscv: Memory Hot(Un)Plug support Björn Töpel
` (4 preceding siblings ...)
2024-05-21 11:48 ` [PATCH v3 5/9] riscv: mm: Add memory hotplugging support Björn Töpel
@ 2024-05-21 11:48 ` Björn Töpel
2024-05-21 11:48 ` [PATCH v3 7/9] riscv: Enable memory hotplugging for RISC-V Björn Töpel
` (2 subsequent siblings)
8 siblings, 0 replies; 19+ messages in thread
From: Björn Töpel @ 2024-05-21 11:48 UTC (permalink / raw)
To: Alexandre Ghiti, Albert Ou, David Hildenbrand, Palmer Dabbelt,
Paul Walmsley, linux-riscv, Oscar Salvador
Cc: Björn Töpel, Andrew Bresticker, Chethan Seshadri,
Lorenzo Stoakes, Santosh Mamila, Sivakumar Munnangi, Sunil V L,
linux-kernel, linux-mm, virtualization
From: Björn Töpel <bjorn@rivosinc.com>
During memory hot remove, the ptdump functionality can end up touching
stale data. Avoid any potential crashes (or worse), by holding the
memory hotplug read-lock while traversing the page table.
This change is analogous to arm64's commit bf2b59f60ee1 ("arm64/mm:
Hold memory hotplug lock while walking for kernel page table dump").
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
---
arch/riscv/mm/ptdump.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/arch/riscv/mm/ptdump.c b/arch/riscv/mm/ptdump.c
index 1289cc6d3700..9d5f657a251b 100644
--- a/arch/riscv/mm/ptdump.c
+++ b/arch/riscv/mm/ptdump.c
@@ -6,6 +6,7 @@
#include <linux/efi.h>
#include <linux/init.h>
#include <linux/debugfs.h>
+#include <linux/memory_hotplug.h>
#include <linux/seq_file.h>
#include <linux/ptdump.h>
@@ -370,7 +371,9 @@ bool ptdump_check_wx(void)
static int ptdump_show(struct seq_file *m, void *v)
{
+ get_online_mems();
ptdump_walk(m, m->private);
+ put_online_mems();
return 0;
}
--
2.40.1
^ permalink raw reply [flat|nested] 19+ messages in thread* [PATCH v3 7/9] riscv: Enable memory hotplugging for RISC-V
2024-05-21 11:48 [PATCH v3 0/9] riscv: Memory Hot(Un)Plug support Björn Töpel
` (5 preceding siblings ...)
2024-05-21 11:48 ` [PATCH v3 6/9] riscv: mm: Take memory hotplug read-lock during kernel page table dump Björn Töpel
@ 2024-05-21 11:48 ` Björn Töpel
2024-05-21 13:23 ` Alexandre Ghiti
2024-05-21 11:48 ` [PATCH v3 8/9] virtio-mem: Enable virtio-mem " Björn Töpel
2024-05-21 11:48 ` [PATCH v3 9/9] riscv: mm: Add support for ZONE_DEVICE Björn Töpel
8 siblings, 1 reply; 19+ messages in thread
From: Björn Töpel @ 2024-05-21 11:48 UTC (permalink / raw)
To: Alexandre Ghiti, Albert Ou, David Hildenbrand, Palmer Dabbelt,
Paul Walmsley, linux-riscv, Oscar Salvador
Cc: Björn Töpel, Andrew Bresticker, Chethan Seshadri,
Lorenzo Stoakes, Santosh Mamila, Sivakumar Munnangi, Sunil V L,
linux-kernel, linux-mm, virtualization
From: Björn Töpel <bjorn@rivosinc.com>
Enable ARCH_ENABLE_MEMORY_HOTPLUG and ARCH_ENABLE_MEMORY_HOTREMOVE for
RISC-V.
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
---
arch/riscv/Kconfig | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index fe5281398543..2724dc2af29f 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -16,6 +16,8 @@ config RISCV
select ACPI_REDUCED_HARDWARE_ONLY if ACPI
select ARCH_DMA_DEFAULT_COHERENT
select ARCH_ENABLE_HUGEPAGE_MIGRATION if HUGETLB_PAGE && MIGRATION
+ select ARCH_ENABLE_MEMORY_HOTPLUG if SPARSEMEM_VMEMMAP && 64BIT && MMU
+ select ARCH_ENABLE_MEMORY_HOTREMOVE if MEMORY_HOTPLUG
select ARCH_ENABLE_SPLIT_PMD_PTLOCK if PGTABLE_LEVELS > 2
select ARCH_ENABLE_THP_MIGRATION if TRANSPARENT_HUGEPAGE
select ARCH_HAS_BINFMT_FLAT
--
2.40.1
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH v3 7/9] riscv: Enable memory hotplugging for RISC-V
2024-05-21 11:48 ` [PATCH v3 7/9] riscv: Enable memory hotplugging for RISC-V Björn Töpel
@ 2024-05-21 13:23 ` Alexandre Ghiti
0 siblings, 0 replies; 19+ messages in thread
From: Alexandre Ghiti @ 2024-05-21 13:23 UTC (permalink / raw)
To: Björn Töpel
Cc: Albert Ou, David Hildenbrand, Palmer Dabbelt, Paul Walmsley,
linux-riscv, Oscar Salvador, Björn Töpel,
Andrew Bresticker, Chethan Seshadri, Lorenzo Stoakes,
Santosh Mamila, Sivakumar Munnangi, Sunil V L, linux-kernel,
linux-mm, virtualization
On Tue, May 21, 2024 at 1:49 PM Björn Töpel <bjorn@kernel.org> wrote:
>
> From: Björn Töpel <bjorn@rivosinc.com>
>
> Enable ARCH_ENABLE_MEMORY_HOTPLUG and ARCH_ENABLE_MEMORY_HOTREMOVE for
> RISC-V.
>
> Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
> ---
> arch/riscv/Kconfig | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index fe5281398543..2724dc2af29f 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -16,6 +16,8 @@ config RISCV
> select ACPI_REDUCED_HARDWARE_ONLY if ACPI
> select ARCH_DMA_DEFAULT_COHERENT
> select ARCH_ENABLE_HUGEPAGE_MIGRATION if HUGETLB_PAGE && MIGRATION
> + select ARCH_ENABLE_MEMORY_HOTPLUG if SPARSEMEM_VMEMMAP && 64BIT && MMU
Not sure you need 64BIT && MMU here since ARCH_SPARSEMEM_ENABLE
depends on MMU and SPARSEMEM_VMEMMAP_ENABLE is only enabled on 64BIT.
> + select ARCH_ENABLE_MEMORY_HOTREMOVE if MEMORY_HOTPLUG
> select ARCH_ENABLE_SPLIT_PMD_PTLOCK if PGTABLE_LEVELS > 2
> select ARCH_ENABLE_THP_MIGRATION if TRANSPARENT_HUGEPAGE
> select ARCH_HAS_BINFMT_FLAT
> --
> 2.40.1
>
But anyway, to me that does not require a new version so you can add:
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Thanks,
Alex
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH v3 8/9] virtio-mem: Enable virtio-mem for RISC-V
2024-05-21 11:48 [PATCH v3 0/9] riscv: Memory Hot(Un)Plug support Björn Töpel
` (6 preceding siblings ...)
2024-05-21 11:48 ` [PATCH v3 7/9] riscv: Enable memory hotplugging for RISC-V Björn Töpel
@ 2024-05-21 11:48 ` Björn Töpel
2024-05-21 11:48 ` [PATCH v3 9/9] riscv: mm: Add support for ZONE_DEVICE Björn Töpel
8 siblings, 0 replies; 19+ messages in thread
From: Björn Töpel @ 2024-05-21 11:48 UTC (permalink / raw)
To: Alexandre Ghiti, Albert Ou, David Hildenbrand, Palmer Dabbelt,
Paul Walmsley, linux-riscv, Oscar Salvador
Cc: Björn Töpel, Andrew Bresticker, Chethan Seshadri,
Lorenzo Stoakes, Santosh Mamila, Sivakumar Munnangi, Sunil V L,
linux-kernel, linux-mm, virtualization
From: Björn Töpel <bjorn@rivosinc.com>
Now that RISC-V has memory hotplugging support, virtio-mem can be used
on the platform.
Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
---
drivers/virtio/Kconfig | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
index c17193544268..4e5cebf1b82a 100644
--- a/drivers/virtio/Kconfig
+++ b/drivers/virtio/Kconfig
@@ -122,7 +122,7 @@ config VIRTIO_BALLOON
config VIRTIO_MEM
tristate "Virtio mem driver"
- depends on X86_64 || ARM64
+ depends on X86_64 || ARM64 || RISCV
depends on VIRTIO
depends on MEMORY_HOTPLUG
depends on MEMORY_HOTREMOVE
--
2.40.1
^ permalink raw reply [flat|nested] 19+ messages in thread* [PATCH v3 9/9] riscv: mm: Add support for ZONE_DEVICE
2024-05-21 11:48 [PATCH v3 0/9] riscv: Memory Hot(Un)Plug support Björn Töpel
` (7 preceding siblings ...)
2024-05-21 11:48 ` [PATCH v3 8/9] virtio-mem: Enable virtio-mem " Björn Töpel
@ 2024-05-21 11:48 ` Björn Töpel
2024-05-21 13:41 ` Alexandre Ghiti
8 siblings, 1 reply; 19+ messages in thread
From: Björn Töpel @ 2024-05-21 11:48 UTC (permalink / raw)
To: Alexandre Ghiti, Albert Ou, David Hildenbrand, Palmer Dabbelt,
Paul Walmsley, linux-riscv, Oscar Salvador
Cc: Björn Töpel, Andrew Bresticker, Chethan Seshadri,
Lorenzo Stoakes, Santosh Mamila, Sivakumar Munnangi, Sunil V L,
linux-kernel, linux-mm, virtualization
From: Björn Töpel <bjorn@rivosinc.com>
ZONE_DEVICE pages need DEVMAP PTEs support to function
(ARCH_HAS_PTE_DEVMAP). Claim another RSW (reserved for software) bit
in the PTE for DEVMAP mark, add the corresponding helpers, and enable
ARCH_HAS_PTE_DEVMAP for riscv64.
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
---
arch/riscv/Kconfig | 1 +
arch/riscv/include/asm/pgtable-64.h | 20 ++++++++++++++++++++
arch/riscv/include/asm/pgtable-bits.h | 1 +
arch/riscv/include/asm/pgtable.h | 17 +++++++++++++++++
4 files changed, 39 insertions(+)
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 2724dc2af29f..0b74698c63c7 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -36,6 +36,7 @@ config RISCV
select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
select ARCH_HAS_PMEM_API
select ARCH_HAS_PREPARE_SYNC_CORE_CMD
+ select ARCH_HAS_PTE_DEVMAP if 64BIT && MMU
select ARCH_HAS_PTE_SPECIAL
select ARCH_HAS_SET_DIRECT_MAP if MMU
select ARCH_HAS_SET_MEMORY if MMU
diff --git a/arch/riscv/include/asm/pgtable-64.h b/arch/riscv/include/asm/pgtable-64.h
index 221a5c1ee287..c67a9bbfd010 100644
--- a/arch/riscv/include/asm/pgtable-64.h
+++ b/arch/riscv/include/asm/pgtable-64.h
@@ -400,4 +400,24 @@ static inline struct page *pgd_page(pgd_t pgd)
#define p4d_offset p4d_offset
p4d_t *p4d_offset(pgd_t *pgd, unsigned long address);
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+static inline int pte_devmap(pte_t pte);
+static inline pte_t pmd_pte(pmd_t pmd);
+
+static inline int pmd_devmap(pmd_t pmd)
+{
+ return pte_devmap(pmd_pte(pmd));
+}
+
+static inline int pud_devmap(pud_t pud)
+{
+ return 0;
+}
+
+static inline int pgd_devmap(pgd_t pgd)
+{
+ return 0;
+}
+#endif
+
#endif /* _ASM_RISCV_PGTABLE_64_H */
diff --git a/arch/riscv/include/asm/pgtable-bits.h b/arch/riscv/include/asm/pgtable-bits.h
index 179bd4afece4..a8f5205cea54 100644
--- a/arch/riscv/include/asm/pgtable-bits.h
+++ b/arch/riscv/include/asm/pgtable-bits.h
@@ -19,6 +19,7 @@
#define _PAGE_SOFT (3 << 8) /* Reserved for software */
#define _PAGE_SPECIAL (1 << 8) /* RSW: 0x1 */
+#define _PAGE_DEVMAP (1 << 9) /* RSW, devmap */
#define _PAGE_TABLE _PAGE_PRESENT
/*
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 7933f493db71..02fadc276064 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -387,6 +387,13 @@ static inline int pte_special(pte_t pte)
return pte_val(pte) & _PAGE_SPECIAL;
}
+#ifdef CONFIG_ARCH_HAS_PTE_DEVMAP
+static inline int pte_devmap(pte_t pte)
+{
+ return pte_val(pte) & _PAGE_DEVMAP;
+}
+#endif
+
/* static inline pte_t pte_rdprotect(pte_t pte) */
static inline pte_t pte_wrprotect(pte_t pte)
@@ -428,6 +435,11 @@ static inline pte_t pte_mkspecial(pte_t pte)
return __pte(pte_val(pte) | _PAGE_SPECIAL);
}
+static inline pte_t pte_mkdevmap(pte_t pte)
+{
+ return __pte(pte_val(pte) | _PAGE_DEVMAP);
+}
+
static inline pte_t pte_mkhuge(pte_t pte)
{
return pte;
@@ -711,6 +723,11 @@ static inline pmd_t pmd_mkdirty(pmd_t pmd)
return pte_pmd(pte_mkdirty(pmd_pte(pmd)));
}
+static inline pmd_t pmd_mkdevmap(pmd_t pmd)
+{
+ return pte_pmd(pte_mkdevmap(pmd_pte(pmd)));
+}
+
static inline void set_pmd_at(struct mm_struct *mm, unsigned long addr,
pmd_t *pmdp, pmd_t pmd)
{
--
2.40.1
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH v3 9/9] riscv: mm: Add support for ZONE_DEVICE
2024-05-21 11:48 ` [PATCH v3 9/9] riscv: mm: Add support for ZONE_DEVICE Björn Töpel
@ 2024-05-21 13:41 ` Alexandre Ghiti
2024-05-21 14:13 ` Björn Töpel
0 siblings, 1 reply; 19+ messages in thread
From: Alexandre Ghiti @ 2024-05-21 13:41 UTC (permalink / raw)
To: Björn Töpel
Cc: Albert Ou, David Hildenbrand, Palmer Dabbelt, Paul Walmsley,
linux-riscv, Oscar Salvador, Björn Töpel,
Andrew Bresticker, Chethan Seshadri, Lorenzo Stoakes,
Santosh Mamila, Sivakumar Munnangi, Sunil V L, linux-kernel,
linux-mm, virtualization
On Tue, May 21, 2024 at 1:49 PM Björn Töpel <bjorn@kernel.org> wrote:
>
> From: Björn Töpel <bjorn@rivosinc.com>
>
> ZONE_DEVICE pages need DEVMAP PTEs support to function
> (ARCH_HAS_PTE_DEVMAP). Claim another RSW (reserved for software) bit
> in the PTE for DEVMAP mark, add the corresponding helpers, and enable
> ARCH_HAS_PTE_DEVMAP for riscv64.
>
> Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
> ---
> arch/riscv/Kconfig | 1 +
> arch/riscv/include/asm/pgtable-64.h | 20 ++++++++++++++++++++
> arch/riscv/include/asm/pgtable-bits.h | 1 +
> arch/riscv/include/asm/pgtable.h | 17 +++++++++++++++++
> 4 files changed, 39 insertions(+)
>
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index 2724dc2af29f..0b74698c63c7 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -36,6 +36,7 @@ config RISCV
> select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
> select ARCH_HAS_PMEM_API
> select ARCH_HAS_PREPARE_SYNC_CORE_CMD
> + select ARCH_HAS_PTE_DEVMAP if 64BIT && MMU
> select ARCH_HAS_PTE_SPECIAL
> select ARCH_HAS_SET_DIRECT_MAP if MMU
> select ARCH_HAS_SET_MEMORY if MMU
> diff --git a/arch/riscv/include/asm/pgtable-64.h b/arch/riscv/include/asm/pgtable-64.h
> index 221a5c1ee287..c67a9bbfd010 100644
> --- a/arch/riscv/include/asm/pgtable-64.h
> +++ b/arch/riscv/include/asm/pgtable-64.h
> @@ -400,4 +400,24 @@ static inline struct page *pgd_page(pgd_t pgd)
> #define p4d_offset p4d_offset
> p4d_t *p4d_offset(pgd_t *pgd, unsigned long address);
>
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> +static inline int pte_devmap(pte_t pte);
> +static inline pte_t pmd_pte(pmd_t pmd);
> +
> +static inline int pmd_devmap(pmd_t pmd)
> +{
> + return pte_devmap(pmd_pte(pmd));
> +}
> +
> +static inline int pud_devmap(pud_t pud)
> +{
> + return 0;
> +}
> +
> +static inline int pgd_devmap(pgd_t pgd)
> +{
> + return 0;
> +}
> +#endif
> +
> #endif /* _ASM_RISCV_PGTABLE_64_H */
> diff --git a/arch/riscv/include/asm/pgtable-bits.h b/arch/riscv/include/asm/pgtable-bits.h
> index 179bd4afece4..a8f5205cea54 100644
> --- a/arch/riscv/include/asm/pgtable-bits.h
> +++ b/arch/riscv/include/asm/pgtable-bits.h
> @@ -19,6 +19,7 @@
> #define _PAGE_SOFT (3 << 8) /* Reserved for software */
>
> #define _PAGE_SPECIAL (1 << 8) /* RSW: 0x1 */
> +#define _PAGE_DEVMAP (1 << 9) /* RSW, devmap */
> #define _PAGE_TABLE _PAGE_PRESENT
>
> /*
> diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
> index 7933f493db71..02fadc276064 100644
> --- a/arch/riscv/include/asm/pgtable.h
> +++ b/arch/riscv/include/asm/pgtable.h
> @@ -387,6 +387,13 @@ static inline int pte_special(pte_t pte)
> return pte_val(pte) & _PAGE_SPECIAL;
> }
>
> +#ifdef CONFIG_ARCH_HAS_PTE_DEVMAP
> +static inline int pte_devmap(pte_t pte)
> +{
> + return pte_val(pte) & _PAGE_DEVMAP;
> +}
> +#endif
Not sure you need the #ifdef here.
> +
> /* static inline pte_t pte_rdprotect(pte_t pte) */
>
> static inline pte_t pte_wrprotect(pte_t pte)
> @@ -428,6 +435,11 @@ static inline pte_t pte_mkspecial(pte_t pte)
> return __pte(pte_val(pte) | _PAGE_SPECIAL);
> }
>
> +static inline pte_t pte_mkdevmap(pte_t pte)
> +{
> + return __pte(pte_val(pte) | _PAGE_DEVMAP);
> +}
> +
> static inline pte_t pte_mkhuge(pte_t pte)
> {
> return pte;
> @@ -711,6 +723,11 @@ static inline pmd_t pmd_mkdirty(pmd_t pmd)
> return pte_pmd(pte_mkdirty(pmd_pte(pmd)));
> }
>
> +static inline pmd_t pmd_mkdevmap(pmd_t pmd)
> +{
> + return pte_pmd(pte_mkdevmap(pmd_pte(pmd)));
> +}
> +
> static inline void set_pmd_at(struct mm_struct *mm, unsigned long addr,
> pmd_t *pmdp, pmd_t pmd)
> {
> --
> 2.40.1
>
Otherwise, you can add:
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Thanks,
Alex
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH v3 9/9] riscv: mm: Add support for ZONE_DEVICE
2024-05-21 13:41 ` Alexandre Ghiti
@ 2024-05-21 14:13 ` Björn Töpel
0 siblings, 0 replies; 19+ messages in thread
From: Björn Töpel @ 2024-05-21 14:13 UTC (permalink / raw)
To: Alexandre Ghiti
Cc: Albert Ou, David Hildenbrand, Palmer Dabbelt, Paul Walmsley,
linux-riscv, Oscar Salvador, Björn Töpel,
Andrew Bresticker, Chethan Seshadri, Lorenzo Stoakes,
Santosh Mamila, Sivakumar Munnangi, Sunil V L, linux-kernel,
linux-mm, virtualization
Alexandre Ghiti <alexghiti@rivosinc.com> writes:
> On Tue, May 21, 2024 at 1:49 PM Björn Töpel <bjorn@kernel.org> wrote:
>>
>> From: Björn Töpel <bjorn@rivosinc.com>
>>
>> ZONE_DEVICE pages need DEVMAP PTEs support to function
>> (ARCH_HAS_PTE_DEVMAP). Claim another RSW (reserved for software) bit
>> in the PTE for DEVMAP mark, add the corresponding helpers, and enable
>> ARCH_HAS_PTE_DEVMAP for riscv64.
>>
>> Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
>> ---
>> arch/riscv/Kconfig | 1 +
>> arch/riscv/include/asm/pgtable-64.h | 20 ++++++++++++++++++++
>> arch/riscv/include/asm/pgtable-bits.h | 1 +
>> arch/riscv/include/asm/pgtable.h | 17 +++++++++++++++++
>> 4 files changed, 39 insertions(+)
>>
>> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
>> index 2724dc2af29f..0b74698c63c7 100644
>> --- a/arch/riscv/Kconfig
>> +++ b/arch/riscv/Kconfig
>> @@ -36,6 +36,7 @@ config RISCV
>> select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
>> select ARCH_HAS_PMEM_API
>> select ARCH_HAS_PREPARE_SYNC_CORE_CMD
>> + select ARCH_HAS_PTE_DEVMAP if 64BIT && MMU
>> select ARCH_HAS_PTE_SPECIAL
>> select ARCH_HAS_SET_DIRECT_MAP if MMU
>> select ARCH_HAS_SET_MEMORY if MMU
>> diff --git a/arch/riscv/include/asm/pgtable-64.h b/arch/riscv/include/asm/pgtable-64.h
>> index 221a5c1ee287..c67a9bbfd010 100644
>> --- a/arch/riscv/include/asm/pgtable-64.h
>> +++ b/arch/riscv/include/asm/pgtable-64.h
>> @@ -400,4 +400,24 @@ static inline struct page *pgd_page(pgd_t pgd)
>> #define p4d_offset p4d_offset
>> p4d_t *p4d_offset(pgd_t *pgd, unsigned long address);
>>
>> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
>> +static inline int pte_devmap(pte_t pte);
>> +static inline pte_t pmd_pte(pmd_t pmd);
>> +
>> +static inline int pmd_devmap(pmd_t pmd)
>> +{
>> + return pte_devmap(pmd_pte(pmd));
>> +}
>> +
>> +static inline int pud_devmap(pud_t pud)
>> +{
>> + return 0;
>> +}
>> +
>> +static inline int pgd_devmap(pgd_t pgd)
>> +{
>> + return 0;
>> +}
>> +#endif
>> +
>> #endif /* _ASM_RISCV_PGTABLE_64_H */
>> diff --git a/arch/riscv/include/asm/pgtable-bits.h b/arch/riscv/include/asm/pgtable-bits.h
>> index 179bd4afece4..a8f5205cea54 100644
>> --- a/arch/riscv/include/asm/pgtable-bits.h
>> +++ b/arch/riscv/include/asm/pgtable-bits.h
>> @@ -19,6 +19,7 @@
>> #define _PAGE_SOFT (3 << 8) /* Reserved for software */
>>
>> #define _PAGE_SPECIAL (1 << 8) /* RSW: 0x1 */
>> +#define _PAGE_DEVMAP (1 << 9) /* RSW, devmap */
>> #define _PAGE_TABLE _PAGE_PRESENT
>>
>> /*
>> diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
>> index 7933f493db71..02fadc276064 100644
>> --- a/arch/riscv/include/asm/pgtable.h
>> +++ b/arch/riscv/include/asm/pgtable.h
>> @@ -387,6 +387,13 @@ static inline int pte_special(pte_t pte)
>> return pte_val(pte) & _PAGE_SPECIAL;
>> }
>>
>> +#ifdef CONFIG_ARCH_HAS_PTE_DEVMAP
>> +static inline int pte_devmap(pte_t pte)
>> +{
>> + return pte_val(pte) & _PAGE_DEVMAP;
>> +}
>> +#endif
>
> Not sure you need the #ifdef here.
W/o it 32b builds break (!defined(CONFIG_ARCH_HAS_PTE_DEVMAP) will have
a default implementation).. Maybe it's cleaner just to use that instead?
>> +
>> /* static inline pte_t pte_rdprotect(pte_t pte) */
>>
>> static inline pte_t pte_wrprotect(pte_t pte)
>> @@ -428,6 +435,11 @@ static inline pte_t pte_mkspecial(pte_t pte)
>> return __pte(pte_val(pte) | _PAGE_SPECIAL);
>> }
>>
>> +static inline pte_t pte_mkdevmap(pte_t pte)
>> +{
>> + return __pte(pte_val(pte) | _PAGE_DEVMAP);
>> +}
>> +
>> static inline pte_t pte_mkhuge(pte_t pte)
>> {
>> return pte;
>> @@ -711,6 +723,11 @@ static inline pmd_t pmd_mkdirty(pmd_t pmd)
>> return pte_pmd(pte_mkdirty(pmd_pte(pmd)));
>> }
>>
>> +static inline pmd_t pmd_mkdevmap(pmd_t pmd)
>> +{
>> + return pte_pmd(pte_mkdevmap(pmd_pte(pmd)));
>> +}
>> +
>> static inline void set_pmd_at(struct mm_struct *mm, unsigned long addr,
>> pmd_t *pmdp, pmd_t pmd)
>> {
>> --
>> 2.40.1
>>
>
> Otherwise, you can add:
>
> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Thank you!
Björn
^ permalink raw reply [flat|nested] 19+ messages in thread