* [PATCH] ARM64: Implement arch_report_meminfo()
@ 2023-12-08 21:11 Christoph Lameter (Ampere)
2023-12-12 18:31 ` Yang Shi
2023-12-14 13:02 ` Robin Murphy
0 siblings, 2 replies; 7+ messages in thread
From: Christoph Lameter (Ampere) @ 2023-12-08 21:11 UTC (permalink / raw)
To: Catalin Marinas, Marc Zyngier, Will Deacon, Ryan Roberts,
Mark Rutland, Vishal Moola
Cc: linux-arm-kernel, linux-mm
X86 has information in /proc/meminfo showing the use of large mappings
for the kernel direct map. This has now also become important for
ARM since the kernel default CONFIG_RODATA_FULL_DEFAULT_ENABLED
forces 4K PTE use for the direct map and users may not be aware
of the performance impact of the increased TLB use etc.
The output of /proc/meminfo on ARM64 is then after this patch:
4K page size:
Hugepagesize: 2048 kB
Hugetlb: 0 kB
DirectMap4k: 155912 kB
CONT DMap4k: 1176 kB
DirectMap2M: 722944 kB
CONT DMap2M: 28672 kB
DirectMap1G: 534773760 kB
64K page size:
Hugepagesize: 524288 kB
Hugetlb: 0 kB
DirectMap64k: 882624 kB
CONT DMap64k: 19904 kB
DirectMap512M: 534773760 kB
CONT DMap512M: 0 kB
DirectMap4096G: 0 kB
Signed-off-by: Christoph Lameter <cl@linux.com>
Index: linux/arch/arm64/mm/mmu.c
===================================================================
--- linux.orig/arch/arm64/mm/mmu.c
+++ linux/arch/arm64/mm/mmu.c
@@ -66,6 +66,12 @@ u32 __boot_cpu_mode[] = { BOOT_CPU_MODE_
*/
long __section(".mmuoff.data.write") __early_cpu_boot_status;
+static atomic_t nr_pte;
+static atomic_t nr_pmd;
+static atomic_t nr_pud;
+static atomic_t nr_pte_cont;
+static atomic_t nr_pmd_cont;
+
/*
* Empty_zero_page is a special page that is used for zero-initialized data
* and COW.
@@ -179,6 +185,7 @@ static void init_pte(pmd_t *pmdp, unsign
pte_t old_pte = READ_ONCE(*ptep);
set_pte(ptep, pfn_pte(__phys_to_pfn(phys), prot));
+ atomic_inc(&nr_pte);
/*
* After the PTE entry has been populated once, we
@@ -223,8 +230,10 @@ static void alloc_init_cont_pte(pmd_t *p
/* use a contiguous mapping if the range is suitably aligned */
if ((((addr | next | phys) & ~CONT_PTE_MASK) == 0) &&
- (flags & NO_CONT_MAPPINGS) == 0)
+ (flags & NO_CONT_MAPPINGS) == 0) {
__prot = __pgprot(pgprot_val(prot) | PTE_CONT);
+ atomic_inc(&nr_pte_cont);
+ }
init_pte(pmdp, addr, next, phys, __prot);
@@ -249,6 +258,7 @@ static void init_pmd(pud_t *pudp, unsign
if (((addr | next | phys) & ~PMD_MASK) == 0 &&
(flags & NO_BLOCK_MAPPINGS) == 0) {
pmd_set_huge(pmdp, phys, prot);
+ atomic_inc(&nr_pmd);
/*
* After the PMD entry has been populated once, we
@@ -301,8 +311,10 @@ static void alloc_init_cont_pmd(pud_t *p
/* use a contiguous mapping if the range is suitably aligned */
if ((((addr | next | phys) & ~CONT_PMD_MASK) == 0) &&
- (flags & NO_CONT_MAPPINGS) == 0)
+ (flags & NO_CONT_MAPPINGS) == 0) {
__prot = __pgprot(pgprot_val(prot) | PTE_CONT);
+ atomic_inc(&nr_pmd_cont);
+ }
init_pmd(pudp, addr, next, phys, __prot, pgtable_alloc, flags);
@@ -346,7 +358,7 @@ static void alloc_init_pud(pgd_t *pgdp,
((addr | next | phys) & ~PUD_MASK) == 0 &&
(flags & NO_BLOCK_MAPPINGS) == 0) {
pud_set_huge(pudp, phys, prot);
-
+ atomic_inc(&nr_pud);
/*
* After the PUD entry has been populated once, we
* only allow updates to the permission attributes.
@@ -1486,3 +1498,35 @@ void ptep_modify_prot_commit(struct vm_a
{
set_pte_at(vma->vm_mm, addr, ptep, pte);
}
+
+#ifdef CONFIG_PROC_FS
+void arch_report_meminfo(struct seq_file *m)
+{
+ unsigned long pagesize_in_kb = PAGE_SIZE / 1024;
+
+ seq_printf(m, "DirectMap%luk: %8lu kB\n",
+ pagesize_in_kb,
+ (unsigned long)atomic_read(&nr_pte) * pagesize_in_kb);
+
+ seq_printf(m, "CONT DMap%luk: %8lu kB\n",
+ pagesize_in_kb,
+ (unsigned long)atomic_read(&nr_pte_cont) * pagesize_in_kb);
+
+ pagesize_in_kb = PMD_SIZE / 1024;
+
+ seq_printf(m, "DirectMap%luM: %8lu kB\n",
+ pagesize_in_kb / 1024,
+ (unsigned long)atomic_read(&nr_pmd) * pagesize_in_kb);
+
+ seq_printf(m, "CONT DMap%luM: %8lu kB\n",
+ pagesize_in_kb / 1024,
+ (unsigned long)atomic_read(&nr_pmd_cont) * pagesize_in_kb);
+
+ pagesize_in_kb = PUD_SIZE / 1024;
+
+ seq_printf(m, "DirectMap%luG: %10lu kB\n",
+ pagesize_in_kb >> 20,
+ (unsigned long)atomic_read(&nr_pud) * pagesize_in_kb);
+}
+#endif
+
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] ARM64: Implement arch_report_meminfo()
2023-12-08 21:11 [PATCH] ARM64: Implement arch_report_meminfo() Christoph Lameter (Ampere)
@ 2023-12-12 18:31 ` Yang Shi
2023-12-14 5:25 ` Christoph Lameter (Ampere)
2023-12-14 13:02 ` Robin Murphy
1 sibling, 1 reply; 7+ messages in thread
From: Yang Shi @ 2023-12-12 18:31 UTC (permalink / raw)
To: Christoph Lameter (Ampere)
Cc: Catalin Marinas, Marc Zyngier, Will Deacon, Ryan Roberts,
Mark Rutland, Vishal Moola, linux-arm-kernel, linux-mm
On Fri, Dec 8, 2023 at 1:12 PM Christoph Lameter (Ampere) <cl@gentwo.org> wrote:
>
> X86 has information in /proc/meminfo showing the use of large mappings
> for the kernel direct map. This has now also become important for
> ARM since the kernel default CONFIG_RODATA_FULL_DEFAULT_ENABLED
> forces 4K PTE use for the direct map and users may not be aware
> of the performance impact of the increased TLB use etc.
>
> The output of /proc/meminfo on ARM64 is then after this patch:
>
> 4K page size:
>
> Hugepagesize: 2048 kB
> Hugetlb: 0 kB
> DirectMap4k: 155912 kB
> CONT DMap4k: 1176 kB
> DirectMap2M: 722944 kB
> CONT DMap2M: 28672 kB
> DirectMap1G: 534773760 kB
>
> 64K page size:
>
> Hugepagesize: 524288 kB
> Hugetlb: 0 kB
> DirectMap64k: 882624 kB
> CONT DMap64k: 19904 kB
> DirectMap512M: 534773760 kB
> CONT DMap512M: 0 kB
> DirectMap4096G: 0 kB
>
> Signed-off-by: Christoph Lameter <cl@linux.com>
>
> Index: linux/arch/arm64/mm/mmu.c
> ===================================================================
> --- linux.orig/arch/arm64/mm/mmu.c
> +++ linux/arch/arm64/mm/mmu.c
> @@ -66,6 +66,12 @@ u32 __boot_cpu_mode[] = { BOOT_CPU_MODE_
> */
> long __section(".mmuoff.data.write") __early_cpu_boot_status;
>
> +static atomic_t nr_pte;
> +static atomic_t nr_pmd;
> +static atomic_t nr_pud;
> +static atomic_t nr_pte_cont;
> +static atomic_t nr_pmd_cont;
These statistics are useful for debugging. However, why not use the
direct_pages_count[] array to save the counters like other
architectures, for example, x86, ppc and s390?
> +
> /*
> * Empty_zero_page is a special page that is used for zero-initialized data
> * and COW.
> @@ -179,6 +185,7 @@ static void init_pte(pmd_t *pmdp, unsign
> pte_t old_pte = READ_ONCE(*ptep);
>
> set_pte(ptep, pfn_pte(__phys_to_pfn(phys), prot));
> + atomic_inc(&nr_pte);
>
> /*
> * After the PTE entry has been populated once, we
> @@ -223,8 +230,10 @@ static void alloc_init_cont_pte(pmd_t *p
>
> /* use a contiguous mapping if the range is suitably aligned */
> if ((((addr | next | phys) & ~CONT_PTE_MASK) == 0) &&
> - (flags & NO_CONT_MAPPINGS) == 0)
> + (flags & NO_CONT_MAPPINGS) == 0) {
> __prot = __pgprot(pgprot_val(prot) | PTE_CONT);
> + atomic_inc(&nr_pte_cont);
> + }
>
> init_pte(pmdp, addr, next, phys, __prot);
>
> @@ -249,6 +258,7 @@ static void init_pmd(pud_t *pudp, unsign
> if (((addr | next | phys) & ~PMD_MASK) == 0 &&
> (flags & NO_BLOCK_MAPPINGS) == 0) {
> pmd_set_huge(pmdp, phys, prot);
> + atomic_inc(&nr_pmd);
>
> /*
> * After the PMD entry has been populated once, we
> @@ -301,8 +311,10 @@ static void alloc_init_cont_pmd(pud_t *p
>
> /* use a contiguous mapping if the range is suitably aligned */
> if ((((addr | next | phys) & ~CONT_PMD_MASK) == 0) &&
> - (flags & NO_CONT_MAPPINGS) == 0)
> + (flags & NO_CONT_MAPPINGS) == 0) {
> __prot = __pgprot(pgprot_val(prot) | PTE_CONT);
> + atomic_inc(&nr_pmd_cont);
> + }
>
> init_pmd(pudp, addr, next, phys, __prot, pgtable_alloc, flags);
>
> @@ -346,7 +358,7 @@ static void alloc_init_pud(pgd_t *pgdp,
> ((addr | next | phys) & ~PUD_MASK) == 0 &&
> (flags & NO_BLOCK_MAPPINGS) == 0) {
> pud_set_huge(pudp, phys, prot);
> -
> + atomic_inc(&nr_pud);
> /*
> * After the PUD entry has been populated once, we
> * only allow updates to the permission attributes.
> @@ -1486,3 +1498,35 @@ void ptep_modify_prot_commit(struct vm_a
> {
> set_pte_at(vma->vm_mm, addr, ptep, pte);
> }
> +
> +#ifdef CONFIG_PROC_FS
> +void arch_report_meminfo(struct seq_file *m)
> +{
> + unsigned long pagesize_in_kb = PAGE_SIZE / 1024;
> +
> + seq_printf(m, "DirectMap%luk: %8lu kB\n",
> + pagesize_in_kb,
> + (unsigned long)atomic_read(&nr_pte) * pagesize_in_kb);
> +
> + seq_printf(m, "CONT DMap%luk: %8lu kB\n",
> + pagesize_in_kb,
> + (unsigned long)atomic_read(&nr_pte_cont) * pagesize_in_kb);
> +
> + pagesize_in_kb = PMD_SIZE / 1024;
> +
> + seq_printf(m, "DirectMap%luM: %8lu kB\n",
> + pagesize_in_kb / 1024,
> + (unsigned long)atomic_read(&nr_pmd) * pagesize_in_kb);
> +
> + seq_printf(m, "CONT DMap%luM: %8lu kB\n",
> + pagesize_in_kb / 1024,
> + (unsigned long)atomic_read(&nr_pmd_cont) * pagesize_in_kb);
> +
> + pagesize_in_kb = PUD_SIZE / 1024;
> +
> + seq_printf(m, "DirectMap%luG: %10lu kB\n",
> + pagesize_in_kb >> 20,
> + (unsigned long)atomic_read(&nr_pud) * pagesize_in_kb);
> +}
> +#endif
> +
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] ARM64: Implement arch_report_meminfo()
2023-12-12 18:31 ` Yang Shi
@ 2023-12-14 5:25 ` Christoph Lameter (Ampere)
0 siblings, 0 replies; 7+ messages in thread
From: Christoph Lameter (Ampere) @ 2023-12-14 5:25 UTC (permalink / raw)
To: Yang Shi
Cc: Catalin Marinas, Marc Zyngier, Will Deacon, Ryan Roberts,
Mark Rutland, Vishal Moola, linux-arm-kernel, linux-mm
On Tue, 12 Dec 2023, Yang Shi wrote:
>> +static atomic_t nr_pte;
>> +static atomic_t nr_pmd;
>> +static atomic_t nr_pud;
>> +static atomic_t nr_pte_cont;
>> +static atomic_t nr_pmd_cont;
>
> These statistics are useful for debugging. However, why not use the
> direct_pages_count[] array to save the counters like other
> architectures, for example, x86, ppc and s390?
That is because ARM64 also has the CONT features. The code significantly
differs from x86.
Using the above naming scheme ties the values directly to what is supported by
the hardware and results in easier to read source code.
Calling this "direct" something is then a presentation issue.
That is actually something I was not sure about. CONT Direct is a bit
strange. I'd prefer to see PTE/PMD/PUD there whic makes it clear to me.
But I guess others expect to see "Direct Pages" there since they are
used to it from x86.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] ARM64: Implement arch_report_meminfo()
2023-12-08 21:11 [PATCH] ARM64: Implement arch_report_meminfo() Christoph Lameter (Ampere)
2023-12-12 18:31 ` Yang Shi
@ 2023-12-14 13:02 ` Robin Murphy
2023-12-14 21:35 ` Christoph Lameter (Ampere)
1 sibling, 1 reply; 7+ messages in thread
From: Robin Murphy @ 2023-12-14 13:02 UTC (permalink / raw)
To: Christoph Lameter (Ampere),
Catalin Marinas, Marc Zyngier, Will Deacon, Ryan Roberts,
Mark Rutland, Vishal Moola
Cc: linux-arm-kernel, linux-mm
On 2023-12-08 9:11 pm, Christoph Lameter (Ampere) wrote:
> X86 has information in /proc/meminfo showing the use of large mappings
> for the kernel direct map. This has now also become important for
> ARM since the kernel default CONFIG_RODATA_FULL_DEFAULT_ENABLED
> forces 4K PTE use for the direct map and users may not be aware
> of the performance impact of the increased TLB use etc.
>
> The output of /proc/meminfo on ARM64 is then after this patch:
>
> 4K page size:
>
> Hugepagesize: 2048 kB
> Hugetlb: 0 kB
> DirectMap4k: 155912 kB
> CONT DMap4k: 1176 kB
> DirectMap2M: 722944 kB
> CONT DMap2M: 28672 kB
> DirectMap1G: 534773760 kB
>
> 64K page size:
>
> Hugepagesize: 524288 kB
> Hugetlb: 0 kB
> DirectMap64k: 882624 kB
> CONT DMap64k: 19904 kB
> DirectMap512M: 534773760 kB
> CONT DMap512M: 0 kB
> DirectMap4096G: 0 kB
>
> Signed-off-by: Christoph Lameter <cl@linux.com>
>
> Index: linux/arch/arm64/mm/mmu.c
> ===================================================================
> --- linux.orig/arch/arm64/mm/mmu.c
> +++ linux/arch/arm64/mm/mmu.c
> @@ -66,6 +66,12 @@ u32 __boot_cpu_mode[] = { BOOT_CPU_MODE_
> */
> long __section(".mmuoff.data.write") __early_cpu_boot_status;
>
> +static atomic_t nr_pte;
> +static atomic_t nr_pmd;
> +static atomic_t nr_pud;
> +static atomic_t nr_pte_cont;
> +static atomic_t nr_pmd_cont;
> +
> /*
> * Empty_zero_page is a special page that is used for zero-initialized
> data
> * and COW.
> @@ -179,6 +185,7 @@ static void init_pte(pmd_t *pmdp, unsign
> pte_t old_pte = READ_ONCE(*ptep);
>
> set_pte(ptep, pfn_pte(__phys_to_pfn(phys), prot));
> + atomic_inc(&nr_pte);
>
> /*
> * After the PTE entry has been populated once, we
> @@ -223,8 +230,10 @@ static void alloc_init_cont_pte(pmd_t *p
>
> /* use a contiguous mapping if the range is suitably aligned */
> if ((((addr | next | phys) & ~CONT_PTE_MASK) == 0) &&
> - (flags & NO_CONT_MAPPINGS) == 0)
> + (flags & NO_CONT_MAPPINGS) == 0) {
> __prot = __pgprot(pgprot_val(prot) | PTE_CONT);
> + atomic_inc(&nr_pte_cont);
> + }
>
> init_pte(pmdp, addr, next, phys, __prot);
>
> @@ -249,6 +258,7 @@ static void init_pmd(pud_t *pudp, unsign
> if (((addr | next | phys) & ~PMD_MASK) == 0 &&
> (flags & NO_BLOCK_MAPPINGS) == 0) {
> pmd_set_huge(pmdp, phys, prot);
> + atomic_inc(&nr_pmd);
>
> /*
> * After the PMD entry has been populated once, we
> @@ -301,8 +311,10 @@ static void alloc_init_cont_pmd(pud_t *p
>
> /* use a contiguous mapping if the range is suitably aligned */
> if ((((addr | next | phys) & ~CONT_PMD_MASK) == 0) &&
> - (flags & NO_CONT_MAPPINGS) == 0)
> + (flags & NO_CONT_MAPPINGS) == 0) {
> __prot = __pgprot(pgprot_val(prot) | PTE_CONT);
> + atomic_inc(&nr_pmd_cont);
> + }
>
> init_pmd(pudp, addr, next, phys, __prot, pgtable_alloc, flags);
>
> @@ -346,7 +358,7 @@ static void alloc_init_pud(pgd_t *pgdp,
> ((addr | next | phys) & ~PUD_MASK) == 0 &&
> (flags & NO_BLOCK_MAPPINGS) == 0) {
> pud_set_huge(pudp, phys, prot);
> -
> + atomic_inc(&nr_pud);
It seems somewhat suspect that these counts only ever increase. It's not
often that we change or remove parts of the linear map, but it certainly
can happen.
Thanks,
Robin.
> /*
> * After the PUD entry has been populated once, we
> * only allow updates to the permission attributes.
> @@ -1486,3 +1498,35 @@ void ptep_modify_prot_commit(struct vm_a
> {
> set_pte_at(vma->vm_mm, addr, ptep, pte);
> }
> +
> +#ifdef CONFIG_PROC_FS
> +void arch_report_meminfo(struct seq_file *m)
> +{
> + unsigned long pagesize_in_kb = PAGE_SIZE / 1024;
> +
> + seq_printf(m, "DirectMap%luk: %8lu kB\n",
> + pagesize_in_kb,
> + (unsigned long)atomic_read(&nr_pte) * pagesize_in_kb);
> +
> + seq_printf(m, "CONT DMap%luk: %8lu kB\n",
> + pagesize_in_kb,
> + (unsigned long)atomic_read(&nr_pte_cont) * pagesize_in_kb);
> +
> + pagesize_in_kb = PMD_SIZE / 1024;
> +
> + seq_printf(m, "DirectMap%luM: %8lu kB\n",
> + pagesize_in_kb / 1024,
> + (unsigned long)atomic_read(&nr_pmd) * pagesize_in_kb);
> +
> + seq_printf(m, "CONT DMap%luM: %8lu kB\n",
> + pagesize_in_kb / 1024,
> + (unsigned long)atomic_read(&nr_pmd_cont) * pagesize_in_kb);
> +
> + pagesize_in_kb = PUD_SIZE / 1024;
> +
> + seq_printf(m, "DirectMap%luG: %10lu kB\n",
> + pagesize_in_kb >> 20,
> + (unsigned long)atomic_read(&nr_pud) *
> pagesize_in_kb);
> +}
> +#endif
> +
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] ARM64: Implement arch_report_meminfo()
2023-12-14 13:02 ` Robin Murphy
@ 2023-12-14 21:35 ` Christoph Lameter (Ampere)
2023-12-15 19:44 ` Robin Murphy
0 siblings, 1 reply; 7+ messages in thread
From: Christoph Lameter (Ampere) @ 2023-12-14 21:35 UTC (permalink / raw)
To: Robin Murphy
Cc: Catalin Marinas, Marc Zyngier, Will Deacon, Ryan Roberts,
Mark Rutland, Vishal Moola, linux-arm-kernel, linux-mm
On Thu, 14 Dec 2023, Robin Murphy wrote:
> It seems somewhat suspect that these counts only ever increase. It's not
> often that we change or remove parts of the linear map, but it certainly can
> happen.
Well yes in the case of hotplug I guess ... Ok here is V2
From cl@gentwo.org Fri Dec 8 13:11:58 2023
From: "Christoph Lameter (Ampere)" <cl@gentwo.org>
To: Catalin Marinas <catalin.marinas@arm.com>, Marc Zyngier <maz@kernel.org>, Will Deacon <will@kernel.org>, Ryan Roberts <ryan.roberts@arm.com>, Mark Rutland <mark.rutland@arm.com>, Vishal Moola <vishal.moola@gmail.com>
Cc: linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org
Subject: [PATCH] ARM64: Implement arch_report_meminfo()
V1->V2 hotplug and umapping support
X86 has information in /proc/meminfo showing the use of large mappings
for the kernel direct map. This has now also become important for
ARM since the kernel default CONFIG_RODATA_FULL_DEFAULT_ENABLED
forces 4K PTE use for the direct map and users may not be aware
of the performance impact of the increased TLB use etc.
The output of /proc/meminfo on ARM64 is then after this patch:
4K page size:
Hugepagesize: 2048 kB
Hugetlb: 0 kB
DirectMap4k: 155912 kB
CONT DMap4k: 1176 kB
DirectMap2M: 722944 kB
CONT DMap2M: 28672 kB
DirectMap1G: 534773760 kB
64K page size:
Hugepagesize: 524288 kB
Hugetlb: 0 kB
DirectMap64k: 882624 kB
CONT DMap64k: 19904 kB
DirectMap512M: 534773760 kB
CONT DMap512M: 0 kB
DirectMap4096G: 0 kB
Signed-off-by: Christoph Lameter (Ampere) <cl@linux.com>
Index: linux/arch/arm64/mm/mmu.c
===================================================================
--- linux.orig/arch/arm64/mm/mmu.c
+++ linux/arch/arm64/mm/mmu.c
@@ -76,6 +76,13 @@ EXPORT_SYMBOL(empty_zero_page);
static DEFINE_SPINLOCK(swapper_pgdir_lock);
static DEFINE_MUTEX(fixmap_lock);
+static atomic_t nr_pte;
+static atomic_t nr_pmd;
+static atomic_t nr_pud;
+static atomic_t nr_pte_cont;
+static atomic_t nr_pmd_cont;
+
+
void set_swapper_pgd(pgd_t *pgdp, pgd_t pgd)
{
pgd_t *fixmap_pgdp;
@@ -179,6 +186,7 @@ static void init_pte(pmd_t *pmdp, unsign
pte_t old_pte = READ_ONCE(*ptep);
set_pte(ptep, pfn_pte(__phys_to_pfn(phys), prot));
+ atomic_inc(&nr_pte);
/*
* After the PTE entry has been populated once, we
@@ -223,8 +231,10 @@ static void alloc_init_cont_pte(pmd_t *p
/* use a contiguous mapping if the range is suitably aligned */
if ((((addr | next | phys) & ~CONT_PTE_MASK) == 0) &&
- (flags & NO_CONT_MAPPINGS) == 0)
+ (flags & NO_CONT_MAPPINGS) == 0) {
__prot = __pgprot(pgprot_val(prot) | PTE_CONT);
+ atomic_inc(&nr_pte_cont);
+ }
init_pte(pmdp, addr, next, phys, __prot);
@@ -249,6 +259,7 @@ static void init_pmd(pud_t *pudp, unsign
if (((addr | next | phys) & ~PMD_MASK) == 0 &&
(flags & NO_BLOCK_MAPPINGS) == 0) {
pmd_set_huge(pmdp, phys, prot);
+ atomic_inc(&nr_pmd);
/*
* After the PMD entry has been populated once, we
@@ -301,8 +312,10 @@ static void alloc_init_cont_pmd(pud_t *p
/* use a contiguous mapping if the range is suitably aligned */
if ((((addr | next | phys) & ~CONT_PMD_MASK) == 0) &&
- (flags & NO_CONT_MAPPINGS) == 0)
+ (flags & NO_CONT_MAPPINGS) == 0) {
__prot = __pgprot(pgprot_val(prot) | PTE_CONT);
+ atomic_inc(&nr_pmd_cont);
+ }
init_pmd(pudp, addr, next, phys, __prot, pgtable_alloc, flags);
@@ -346,7 +359,7 @@ static void alloc_init_pud(pgd_t *pgdp,
((addr | next | phys) & ~PUD_MASK) == 0 &&
(flags & NO_BLOCK_MAPPINGS) == 0) {
pud_set_huge(pudp, phys, prot);
-
+ atomic_inc(&nr_pud);
/*
* After the PUD entry has been populated once, we
* only allow updates to the permission attributes.
@@ -859,6 +872,11 @@ static void unmap_hotplug_pte_range(pmd_
continue;
WARN_ON(!pte_present(pte));
+
+ if (pte_cont(pte))
+ atomic_dec(&nr_pte_cont);
+ atomic_dec(&nr_pte);
+
pte_clear(&init_mm, addr, ptep);
flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
if (free_mapped)
@@ -883,6 +901,11 @@ static void unmap_hotplug_pmd_range(pud_
WARN_ON(!pmd_present(pmd));
if (pmd_sect(pmd)) {
+
+ if (pmd_cont(pmd))
+ atomic_dec(&nr_pmd_cont);
+ atomic_dec(&nr_pmd);
+
pmd_clear(pmdp);
/*
@@ -916,6 +939,9 @@ static void unmap_hotplug_pud_range(p4d_
WARN_ON(!pud_present(pud));
if (pud_sect(pud)) {
+
+ atomic_dec(&nr_pud);
+
pud_clear(pudp);
/*
@@ -1010,6 +1036,7 @@ static void free_empty_pte_table(pmd_t *
return;
}
+ atomic_dec(&nr_pmd);
pmd_clear(pmdp);
__flush_tlb_kernel_pgtable(start);
free_hotplug_pgtable_page(virt_to_page(ptep));
@@ -1050,6 +1077,7 @@ static void free_empty_pmd_table(pud_t *
return;
}
+ atomic_dec(&nr_pud);
pud_clear(pudp);
__flush_tlb_kernel_pgtable(start);
free_hotplug_pgtable_page(virt_to_page(pmdp));
@@ -1225,6 +1253,7 @@ int pmd_free_pte_page(pmd_t *pmdp, unsig
}
table = pte_offset_kernel(pmdp, addr);
+ atomic_dec(&nr_pmd);
pmd_clear(pmdp);
__flush_tlb_kernel_pgtable(addr);
pte_free_kernel(NULL, table);
@@ -1253,6 +1282,7 @@ int pud_free_pmd_page(pud_t *pudp, unsig
pmd_free_pte_page(pmdp, next);
} while (pmdp++, next += PMD_SIZE, next != end);
+ atomic_dec(&nr_pud);
pud_clear(pudp);
__flush_tlb_kernel_pgtable(addr);
pmd_free(NULL, table);
@@ -1486,3 +1516,36 @@ void ptep_modify_prot_commit(struct vm_a
{
set_pte_at(vma->vm_mm, addr, ptep, pte);
}
+
+
+#ifdef CONFIG_PROC_FS
+void arch_report_meminfo(struct seq_file *m)
+{
+ unsigned long pagesize_in_kb = PAGE_SIZE / 1024;
+
+ seq_printf(m, "DirectMap%luk: %8lu kB\n",
+ pagesize_in_kb,
+ (unsigned long)atomic_read(&nr_pte) * pagesize_in_kb);
+
+ seq_printf(m, "CONT DMap%luk: %8lu kB\n",
+ pagesize_in_kb,
+ (unsigned long)atomic_read(&nr_pte_cont) * pagesize_in_kb);
+
+ pagesize_in_kb = PMD_SIZE / 1024;
+
+ seq_printf(m, "DirectMap%luM: %8lu kB\n",
+ pagesize_in_kb / 1024,
+ (unsigned long)atomic_read(&nr_pmd) * pagesize_in_kb);
+
+ seq_printf(m, "CONT DMap%luM: %8lu kB\n",
+ pagesize_in_kb / 1024,
+ (unsigned long)atomic_read(&nr_pmd_cont) * pagesize_in_kb);
+
+ pagesize_in_kb = PUD_SIZE / 1024;
+
+ seq_printf(m, "DirectMap%luG: %10lu kB\n",
+ pagesize_in_kb >> 20,
+ (unsigned long)atomic_read(&nr_pud) * pagesize_in_kb);
+}
+#endif
+
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] ARM64: Implement arch_report_meminfo()
2023-12-14 21:35 ` Christoph Lameter (Ampere)
@ 2023-12-15 19:44 ` Robin Murphy
2023-12-18 17:49 ` Christoph Lameter (Ampere)
0 siblings, 1 reply; 7+ messages in thread
From: Robin Murphy @ 2023-12-15 19:44 UTC (permalink / raw)
To: Christoph Lameter (Ampere)
Cc: Catalin Marinas, Marc Zyngier, Will Deacon, Ryan Roberts,
Mark Rutland, Vishal Moola, linux-arm-kernel, linux-mm
On 14/12/2023 9:35 pm, Christoph Lameter (Ampere) wrote:
> On Thu, 14 Dec 2023, Robin Murphy wrote:
>
>> It seems somewhat suspect that these counts only ever increase. It's
>> not often that we change or remove parts of the linear map, but it
>> certainly can happen.
>
> Well yes in the case of hotplug I guess ... Ok here is V2
There are also paths where we remove and reinstate parts of the linear
map via set_memory_valid(), set_direct_map_*(), and possibly others. If
we're exposing a user ABI that claims to be accounting kernel VA
mappings, then I think users are within their rights to expect it to
actually account kernel VA mappings, not just expose numbers whose only
guaranteed significance is whether they are zero or nonzero.
Looking again, am I also right in thinking that what I assumed were the
non-contiguous counts here are actually total counts of *either* type of
mapping at that level, and inclusive of the contiguous counts? If so,
that seems a bit non-obvious - my intuitive expectation would be for the
sum of all these numbers to represent the total amount of direct-mapped
RAM, where either we're intersted in each distinct type of mapping and
accounting them all separately, or we're simply interested in the
general shape of the pagetables, and thus would account per-level and
ignore the contiguous bit since we don't know whether it's actually
doing anything useful anyweay.
Thanks,
Robin.
> From cl@gentwo.org Fri Dec 8 13:11:58 2023
> From: "Christoph Lameter (Ampere)" <cl@gentwo.org>
> To: Catalin Marinas <catalin.marinas@arm.com>, Marc Zyngier
> <maz@kernel.org>, Will Deacon <will@kernel.org>, Ryan Roberts
> <ryan.roberts@arm.com>, Mark Rutland <mark.rutland@arm.com>, Vishal
> Moola <vishal.moola@gmail.com>
> Cc: linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org
> Subject: [PATCH] ARM64: Implement arch_report_meminfo()
>
> V1->V2 hotplug and umapping support
>
> X86 has information in /proc/meminfo showing the use of large mappings
> for the kernel direct map. This has now also become important for
> ARM since the kernel default CONFIG_RODATA_FULL_DEFAULT_ENABLED
> forces 4K PTE use for the direct map and users may not be aware
> of the performance impact of the increased TLB use etc.
>
> The output of /proc/meminfo on ARM64 is then after this patch:
>
> 4K page size:
>
> Hugepagesize: 2048 kB
> Hugetlb: 0 kB
> DirectMap4k: 155912 kB
> CONT DMap4k: 1176 kB
> DirectMap2M: 722944 kB
> CONT DMap2M: 28672 kB
> DirectMap1G: 534773760 kB
>
> 64K page size:
>
> Hugepagesize: 524288 kB
> Hugetlb: 0 kB
> DirectMap64k: 882624 kB
> CONT DMap64k: 19904 kB
> DirectMap512M: 534773760 kB
> CONT DMap512M: 0 kB
> DirectMap4096G: 0 kB
>
> Signed-off-by: Christoph Lameter (Ampere) <cl@linux.com>
>
> Index: linux/arch/arm64/mm/mmu.c
> ===================================================================
> --- linux.orig/arch/arm64/mm/mmu.c
> +++ linux/arch/arm64/mm/mmu.c
> @@ -76,6 +76,13 @@ EXPORT_SYMBOL(empty_zero_page);
> static DEFINE_SPINLOCK(swapper_pgdir_lock);
> static DEFINE_MUTEX(fixmap_lock);
>
> +static atomic_t nr_pte;
> +static atomic_t nr_pmd;
> +static atomic_t nr_pud;
> +static atomic_t nr_pte_cont;
> +static atomic_t nr_pmd_cont;
> +
> +
> void set_swapper_pgd(pgd_t *pgdp, pgd_t pgd)
> {
> pgd_t *fixmap_pgdp;
> @@ -179,6 +186,7 @@ static void init_pte(pmd_t *pmdp, unsign
> pte_t old_pte = READ_ONCE(*ptep);
>
> set_pte(ptep, pfn_pte(__phys_to_pfn(phys), prot));
> + atomic_inc(&nr_pte);
>
> /*
> * After the PTE entry has been populated once, we
> @@ -223,8 +231,10 @@ static void alloc_init_cont_pte(pmd_t *p
>
> /* use a contiguous mapping if the range is suitably aligned */
> if ((((addr | next | phys) & ~CONT_PTE_MASK) == 0) &&
> - (flags & NO_CONT_MAPPINGS) == 0)
> + (flags & NO_CONT_MAPPINGS) == 0) {
> __prot = __pgprot(pgprot_val(prot) | PTE_CONT);
> + atomic_inc(&nr_pte_cont);
> + }
>
> init_pte(pmdp, addr, next, phys, __prot);
>
> @@ -249,6 +259,7 @@ static void init_pmd(pud_t *pudp, unsign
> if (((addr | next | phys) & ~PMD_MASK) == 0 &&
> (flags & NO_BLOCK_MAPPINGS) == 0) {
> pmd_set_huge(pmdp, phys, prot);
> + atomic_inc(&nr_pmd);
>
> /*
> * After the PMD entry has been populated once, we
> @@ -301,8 +312,10 @@ static void alloc_init_cont_pmd(pud_t *p
>
> /* use a contiguous mapping if the range is suitably aligned */
> if ((((addr | next | phys) & ~CONT_PMD_MASK) == 0) &&
> - (flags & NO_CONT_MAPPINGS) == 0)
> + (flags & NO_CONT_MAPPINGS) == 0) {
> __prot = __pgprot(pgprot_val(prot) | PTE_CONT);
> + atomic_inc(&nr_pmd_cont);
> + }
>
> init_pmd(pudp, addr, next, phys, __prot, pgtable_alloc, flags);
>
> @@ -346,7 +359,7 @@ static void alloc_init_pud(pgd_t *pgdp,
> ((addr | next | phys) & ~PUD_MASK) == 0 &&
> (flags & NO_BLOCK_MAPPINGS) == 0) {
> pud_set_huge(pudp, phys, prot);
> -
> + atomic_inc(&nr_pud);
> /*
> * After the PUD entry has been populated once, we
> * only allow updates to the permission attributes.
> @@ -859,6 +872,11 @@ static void unmap_hotplug_pte_range(pmd_
> continue;
>
> WARN_ON(!pte_present(pte));
> +
> + if (pte_cont(pte))
> + atomic_dec(&nr_pte_cont);
> + atomic_dec(&nr_pte);
> +
> pte_clear(&init_mm, addr, ptep);
> flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
> if (free_mapped)
> @@ -883,6 +901,11 @@ static void unmap_hotplug_pmd_range(pud_
>
> WARN_ON(!pmd_present(pmd));
> if (pmd_sect(pmd)) {
> +
> + if (pmd_cont(pmd))
> + atomic_dec(&nr_pmd_cont);
> + atomic_dec(&nr_pmd);
> +
> pmd_clear(pmdp);
>
> /*
> @@ -916,6 +939,9 @@ static void unmap_hotplug_pud_range(p4d_
>
> WARN_ON(!pud_present(pud));
> if (pud_sect(pud)) {
> +
> + atomic_dec(&nr_pud);
> +
> pud_clear(pudp);
>
> /*
> @@ -1010,6 +1036,7 @@ static void free_empty_pte_table(pmd_t *
> return;
> }
>
> + atomic_dec(&nr_pmd);
> pmd_clear(pmdp);
> __flush_tlb_kernel_pgtable(start);
> free_hotplug_pgtable_page(virt_to_page(ptep));
> @@ -1050,6 +1077,7 @@ static void free_empty_pmd_table(pud_t *
> return;
> }
>
> + atomic_dec(&nr_pud);
> pud_clear(pudp);
> __flush_tlb_kernel_pgtable(start);
> free_hotplug_pgtable_page(virt_to_page(pmdp));
> @@ -1225,6 +1253,7 @@ int pmd_free_pte_page(pmd_t *pmdp, unsig
> }
>
> table = pte_offset_kernel(pmdp, addr);
> + atomic_dec(&nr_pmd);
> pmd_clear(pmdp);
> __flush_tlb_kernel_pgtable(addr);
> pte_free_kernel(NULL, table);
> @@ -1253,6 +1282,7 @@ int pud_free_pmd_page(pud_t *pudp, unsig
> pmd_free_pte_page(pmdp, next);
> } while (pmdp++, next += PMD_SIZE, next != end);
>
> + atomic_dec(&nr_pud);
> pud_clear(pudp);
> __flush_tlb_kernel_pgtable(addr);
> pmd_free(NULL, table);
> @@ -1486,3 +1516,36 @@ void ptep_modify_prot_commit(struct vm_a
> {
> set_pte_at(vma->vm_mm, addr, ptep, pte);
> }
> +
> +
> +#ifdef CONFIG_PROC_FS
> +void arch_report_meminfo(struct seq_file *m)
> +{
> + unsigned long pagesize_in_kb = PAGE_SIZE / 1024;
> +
> + seq_printf(m, "DirectMap%luk: %8lu kB\n",
> + pagesize_in_kb,
> + (unsigned long)atomic_read(&nr_pte) * pagesize_in_kb);
> +
> + seq_printf(m, "CONT DMap%luk: %8lu kB\n",
> + pagesize_in_kb,
> + (unsigned long)atomic_read(&nr_pte_cont) * pagesize_in_kb);
> +
> + pagesize_in_kb = PMD_SIZE / 1024;
> +
> + seq_printf(m, "DirectMap%luM: %8lu kB\n",
> + pagesize_in_kb / 1024,
> + (unsigned long)atomic_read(&nr_pmd) * pagesize_in_kb);
> +
> + seq_printf(m, "CONT DMap%luM: %8lu kB\n",
> + pagesize_in_kb / 1024,
> + (unsigned long)atomic_read(&nr_pmd_cont) * pagesize_in_kb);
> +
> + pagesize_in_kb = PUD_SIZE / 1024;
> +
> + seq_printf(m, "DirectMap%luG: %10lu kB\n",
> + pagesize_in_kb >> 20,
> + (unsigned long)atomic_read(&nr_pud) * pagesize_in_kb);
> +}
> +#endif
> +
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] ARM64: Implement arch_report_meminfo()
2023-12-15 19:44 ` Robin Murphy
@ 2023-12-18 17:49 ` Christoph Lameter (Ampere)
0 siblings, 0 replies; 7+ messages in thread
From: Christoph Lameter (Ampere) @ 2023-12-18 17:49 UTC (permalink / raw)
To: Robin Murphy
Cc: Catalin Marinas, Marc Zyngier, Will Deacon, Ryan Roberts,
Mark Rutland, Vishal Moola, linux-arm-kernel, linux-mm
On Fri, 15 Dec 2023, Robin Murphy wrote:
> On 14/12/2023 9:35 pm, Christoph Lameter (Ampere) wrote:
>> On Thu, 14 Dec 2023, Robin Murphy wrote:
>>
>>> It seems somewhat suspect that these counts only ever increase. It's not
>>> often that we change or remove parts of the linear map, but it certainly
>>> can happen.
>>
>> Well yes in the case of hotplug I guess ... Ok here is V2
>
> There are also paths where we remove and reinstate parts of the linear map
> via set_memory_valid(), set_direct_map_*(), and possibly others. If we're
> exposing a user ABI that claims to be accounting kernel VA mappings, then I
> think users are within their rights to expect it to actually account kernel
> VA mappings, not just expose numbers whose only guaranteed significance is
> whether they are zero or nonzero.
set_memory_valid() changes mappings via __change_memory_common()
and apply_to_page_range(). It seems that apply_to_page range() creates
PTEs as needed etc.
However, I do not see any accounting for direct map modification
accounting on x86 either. Since this was satifactory for x86 I dont
believe that more is needed. Introducing atomics in potentially
performance sensitive functions run when the kernel is up is not that
advisable and doing so would require core kernel changes going beyond the
enablement of arch_report_meminfo() on ARM64.
> Looking again, am I also right in thinking that what I assumed were the
> non-contiguous counts here are actually total counts of *either* type of
> mapping at that level, and inclusive of the contiguous counts? If so, that
> seems a bit non-obvious - my intuitive expectation would be for the sum of
> all these numbers to represent the total amount of direct-mapped RAM, where
> either we're intersted in each distinct type of mapping and accounting them
> all separately, or we're simply interested in the general shape of the
> pagetables, and thus would account per-level and ignore the contiguous bit
> since we don't know whether it's actually doing anything useful anyweay
Yes, the CONT PTEs are a subset of the other counts since they are only a
special case of the PTE type. They are important for performance on ARM64
in particular with the anticipated use of them for the various sizes of
pages supported in the kernel with the introduction of folios for the page
cache.
The problem with CONT_PTE is that it is not clear whether the architecture
supports it or now. The amount of CONT_PTE can influence the TLB coverage
possible in kernel space.
We are generally interested in the shape of the page tables. If the user
later uses processes that require a degradation through smaller mappings
then that is load dependent.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2023-12-18 17:50 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-08 21:11 [PATCH] ARM64: Implement arch_report_meminfo() Christoph Lameter (Ampere)
2023-12-12 18:31 ` Yang Shi
2023-12-14 5:25 ` Christoph Lameter (Ampere)
2023-12-14 13:02 ` Robin Murphy
2023-12-14 21:35 ` Christoph Lameter (Ampere)
2023-12-15 19:44 ` Robin Murphy
2023-12-18 17:49 ` Christoph Lameter (Ampere)
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox