linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Re: [RFC PATCH v2 00/21] riscv: Introduce 64K base page
       [not found] <20241205103729.14798-1-luxu.kernel@bytedance.com>
@ 2024-12-06  2:00 ` Zi Yan
  2024-12-06  2:41   ` [External] " Xu Lu
  2024-12-06 10:13   ` David Hildenbrand
  0 siblings, 2 replies; 8+ messages in thread
From: Zi Yan @ 2024-12-06  2:00 UTC (permalink / raw)
  To: Xu Lu
  Cc: paul.walmsley, palmer, aou, ardb, anup, atishp, xieyongji,
	lihangjing, punit.agrawal, linux-kernel, linux-riscv, Linux MM

On 5 Dec 2024, at 5:37, Xu Lu wrote:

> This patch series attempts to break through the limitation of MMU and
> supports larger base page on RISC-V, which only supports 4K page size
> now. The key idea is to always manage and allocate memory at a
> granularity of 64K and use SVNAPOT to accelerate address translation.
> This is the second version and the detailed introduction can be found
> in [1].
>
> Changes from v1:
> - Rebase on v6.12.
>
> - Adjust the page table entry shift to reduce page table memory usage.
>     For example, in SV39, the traditional va behaves as:
>
>     ----------------------------------------------
>     | pgd index | pmd index | pte index | offset |
>     ----------------------------------------------
>     | 38     30 | 29     21 | 20     12 | 11   0 |
>     ----------------------------------------------
>
>     When we choose 64K as basic software page, va now behaves as:
>
>     ----------------------------------------------
>     | pgd index | pmd index | pte index | offset |
>     ----------------------------------------------
>     | 38     34 | 33     25 | 24     16 | 15   0 |
>     ----------------------------------------------
>
> - Fix some bugs in v1.
>
> Thanks in advance for comments.
>
> [1] https://lwn.net/Articles/952722/

This looks very interesting. Can you cc me and linux-mm@kvack.org
in the future? Thanks.

Have you thought about doing it for ARM64 4KB as well? ARM64’s contig PTE
should have similar effect of RISC-V’s SVNAPOT, right?

>
> Xu Lu (21):
>   riscv: mm: Distinguish hardware base page and software base page
>   riscv: mm: Configure satp with hw page pfn
>   riscv: mm: Reimplement page table entry structures
>   riscv: mm: Reimplement page table entry constructor function
>   riscv: mm: Reimplement conversion functions between page table entry
>   riscv: mm: Avoid pte constructor during pte conversion
>   riscv: mm: Reimplement page table entry get function
>   riscv: mm: Reimplement page table entry atomic get function
>   riscv: mm: Replace READ_ONCE with atomic pte get function
>   riscv: mm: Reimplement PTE A/D bit check function
>   riscv: mm: Reimplement mk_huge_pte function
>   riscv: mm: Reimplement tlb flush function
>   riscv: mm: Adjust PGDIR/P4D/PUD/PMD_SHIFT
>   riscv: mm: Only apply svnapot region bigger than software page
>   riscv: mm: Adjust FIX_BTMAPS_SLOTS for variable PAGE_SIZE
>   riscv: mm: Adjust FIX_FDT_SIZE for variable PMD_SIZE
>   riscv: mm: Apply Svnapot for base page mapping if possible
>   riscv: Kconfig: Introduce 64K page size
>   riscv: Kconfig: Adjust mmap rnd bits for 64K Page
>   riscv: mm: Adjust address space layout and init page table for 64K
>     Page
>   riscv: mm: Update EXEC_PAGESIZE for 64K Page
>
>  arch/riscv/Kconfig                    |  34 +-
>  arch/riscv/include/asm/fixmap.h       |   3 +-
>  arch/riscv/include/asm/hugetlb.h      |   5 +
>  arch/riscv/include/asm/page.h         |  56 ++-
>  arch/riscv/include/asm/pgtable-32.h   |  12 +-
>  arch/riscv/include/asm/pgtable-64.h   | 128 ++++--
>  arch/riscv/include/asm/pgtable-bits.h |   3 +-
>  arch/riscv/include/asm/pgtable.h      | 564 +++++++++++++++++++++++---
>  arch/riscv/include/asm/tlbflush.h     |  26 +-
>  arch/riscv/include/uapi/asm/param.h   |  24 ++
>  arch/riscv/kernel/head.S              |   4 +-
>  arch/riscv/kernel/hibernate.c         |  21 +-
>  arch/riscv/mm/context.c               |   7 +-
>  arch/riscv/mm/fault.c                 |  15 +-
>  arch/riscv/mm/hugetlbpage.c           |  30 +-
>  arch/riscv/mm/init.c                  |  45 +-
>  arch/riscv/mm/kasan_init.c            |   7 +-
>  arch/riscv/mm/pgtable.c               | 111 ++++-
>  arch/riscv/mm/tlbflush.c              |  31 +-
>  arch/s390/include/asm/hugetlb.h       |   2 +-
>  include/asm-generic/hugetlb.h         |   5 +-
>  include/linux/pgtable.h               |  21 +
>  kernel/events/core.c                  |   6 +-
>  mm/debug_vm_pgtable.c                 |   6 +-
>  mm/gup.c                              |  10 +-
>  mm/hmm.c                              |   2 +-
>  mm/hugetlb.c                          |   4 +-
>  mm/mapping_dirty_helpers.c            |   2 +-
>  mm/memory.c                           |   4 +-
>  mm/mprotect.c                         |   2 +-
>  mm/ptdump.c                           |   8 +-
>  mm/sparse-vmemmap.c                   |   2 +-
>  mm/vmscan.c                           |   2 +-
>  33 files changed, 1029 insertions(+), 173 deletions(-)
>  create mode 100644 arch/riscv/include/uapi/asm/param.h
>
> -- 
> 2.20.1


Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [External] Re: [RFC PATCH v2 00/21] riscv: Introduce 64K base page
  2024-12-06  2:00 ` [RFC PATCH v2 00/21] riscv: Introduce 64K base page Zi Yan
@ 2024-12-06  2:41   ` Xu Lu
  2024-12-06 10:13   ` David Hildenbrand
  1 sibling, 0 replies; 8+ messages in thread
From: Xu Lu @ 2024-12-06  2:41 UTC (permalink / raw)
  To: Zi Yan
  Cc: paul.walmsley, palmer, aou, ardb, anup, atishp, xieyongji,
	lihangjing, punit.agrawal, linux-kernel, linux-riscv, Linux MM

Hi Zi Yan,

On Fri, Dec 6, 2024 at 10:00 AM Zi Yan <ziy@nvidia.com> wrote:
>
> On 5 Dec 2024, at 5:37, Xu Lu wrote:
>
> > This patch series attempts to break through the limitation of MMU and
> > supports larger base page on RISC-V, which only supports 4K page size
> > now. The key idea is to always manage and allocate memory at a
> > granularity of 64K and use SVNAPOT to accelerate address translation.
> > This is the second version and the detailed introduction can be found
> > in [1].
> >
> > Changes from v1:
> > - Rebase on v6.12.
> >
> > - Adjust the page table entry shift to reduce page table memory usage.
> >     For example, in SV39, the traditional va behaves as:
> >
> >     ----------------------------------------------
> >     | pgd index | pmd index | pte index | offset |
> >     ----------------------------------------------
> >     | 38     30 | 29     21 | 20     12 | 11   0 |
> >     ----------------------------------------------
> >
> >     When we choose 64K as basic software page, va now behaves as:
> >
> >     ----------------------------------------------
> >     | pgd index | pmd index | pte index | offset |
> >     ----------------------------------------------
> >     | 38     34 | 33     25 | 24     16 | 15   0 |
> >     ----------------------------------------------
> >
> > - Fix some bugs in v1.
> >
> > Thanks in advance for comments.
> >
> > [1] https://lwn.net/Articles/952722/
>
> This looks very interesting. Can you cc me and linux-mm@kvack.org
> in the future? Thanks.

Of course. Hope this patch can be of any help.

>
> Have you thought about doing it for ARM64 4KB as well? ARM64’s contig PTE
> should have similar effect of RISC-V’s SVNAPOT, right?

I have not thought about it yet. ARM64 has native 64K MMU. The kernel
can directly configure the page size as 64K and MMU will do
translation at corresponding granularity. So I doubt if there is a
need to implement 64K Page Size based on CONT PTE. If you want to use
CONT PTE for acceleration instead of 64K MMU, maybe you can have a try
on THP_CONTPTE[1] which has been merged~

[1] https://lwn.net/Articles/935887/

Best regards,

Xu Lu

>
> >
> > Xu Lu (21):
> >   riscv: mm: Distinguish hardware base page and software base page
> >   riscv: mm: Configure satp with hw page pfn
> >   riscv: mm: Reimplement page table entry structures
> >   riscv: mm: Reimplement page table entry constructor function
> >   riscv: mm: Reimplement conversion functions between page table entry
> >   riscv: mm: Avoid pte constructor during pte conversion
> >   riscv: mm: Reimplement page table entry get function
> >   riscv: mm: Reimplement page table entry atomic get function
> >   riscv: mm: Replace READ_ONCE with atomic pte get function
> >   riscv: mm: Reimplement PTE A/D bit check function
> >   riscv: mm: Reimplement mk_huge_pte function
> >   riscv: mm: Reimplement tlb flush function
> >   riscv: mm: Adjust PGDIR/P4D/PUD/PMD_SHIFT
> >   riscv: mm: Only apply svnapot region bigger than software page
> >   riscv: mm: Adjust FIX_BTMAPS_SLOTS for variable PAGE_SIZE
> >   riscv: mm: Adjust FIX_FDT_SIZE for variable PMD_SIZE
> >   riscv: mm: Apply Svnapot for base page mapping if possible
> >   riscv: Kconfig: Introduce 64K page size
> >   riscv: Kconfig: Adjust mmap rnd bits for 64K Page
> >   riscv: mm: Adjust address space layout and init page table for 64K
> >     Page
> >   riscv: mm: Update EXEC_PAGESIZE for 64K Page
> >
> >  arch/riscv/Kconfig                    |  34 +-
> >  arch/riscv/include/asm/fixmap.h       |   3 +-
> >  arch/riscv/include/asm/hugetlb.h      |   5 +
> >  arch/riscv/include/asm/page.h         |  56 ++-
> >  arch/riscv/include/asm/pgtable-32.h   |  12 +-
> >  arch/riscv/include/asm/pgtable-64.h   | 128 ++++--
> >  arch/riscv/include/asm/pgtable-bits.h |   3 +-
> >  arch/riscv/include/asm/pgtable.h      | 564 +++++++++++++++++++++++---
> >  arch/riscv/include/asm/tlbflush.h     |  26 +-
> >  arch/riscv/include/uapi/asm/param.h   |  24 ++
> >  arch/riscv/kernel/head.S              |   4 +-
> >  arch/riscv/kernel/hibernate.c         |  21 +-
> >  arch/riscv/mm/context.c               |   7 +-
> >  arch/riscv/mm/fault.c                 |  15 +-
> >  arch/riscv/mm/hugetlbpage.c           |  30 +-
> >  arch/riscv/mm/init.c                  |  45 +-
> >  arch/riscv/mm/kasan_init.c            |   7 +-
> >  arch/riscv/mm/pgtable.c               | 111 ++++-
> >  arch/riscv/mm/tlbflush.c              |  31 +-
> >  arch/s390/include/asm/hugetlb.h       |   2 +-
> >  include/asm-generic/hugetlb.h         |   5 +-
> >  include/linux/pgtable.h               |  21 +
> >  kernel/events/core.c                  |   6 +-
> >  mm/debug_vm_pgtable.c                 |   6 +-
> >  mm/gup.c                              |  10 +-
> >  mm/hmm.c                              |   2 +-
> >  mm/hugetlb.c                          |   4 +-
> >  mm/mapping_dirty_helpers.c            |   2 +-
> >  mm/memory.c                           |   4 +-
> >  mm/mprotect.c                         |   2 +-
> >  mm/ptdump.c                           |   8 +-
> >  mm/sparse-vmemmap.c                   |   2 +-
> >  mm/vmscan.c                           |   2 +-
> >  33 files changed, 1029 insertions(+), 173 deletions(-)
> >  create mode 100644 arch/riscv/include/uapi/asm/param.h
> >
> > --
> > 2.20.1
>
>
> Best Regards,
> Yan, Zi


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH v2 00/21] riscv: Introduce 64K base page
  2024-12-06  2:00 ` [RFC PATCH v2 00/21] riscv: Introduce 64K base page Zi Yan
  2024-12-06  2:41   ` [External] " Xu Lu
@ 2024-12-06 10:13   ` David Hildenbrand
  2024-12-06 13:42     ` [External] " Xu Lu
  1 sibling, 1 reply; 8+ messages in thread
From: David Hildenbrand @ 2024-12-06 10:13 UTC (permalink / raw)
  To: Zi Yan, Xu Lu
  Cc: paul.walmsley, palmer, aou, ardb, anup, atishp, xieyongji,
	lihangjing, punit.agrawal, linux-kernel, linux-riscv, Linux MM

On 06.12.24 03:00, Zi Yan wrote:
> On 5 Dec 2024, at 5:37, Xu Lu wrote:
> 
>> This patch series attempts to break through the limitation of MMU and
>> supports larger base page on RISC-V, which only supports 4K page size
>> now. The key idea is to always manage and allocate memory at a
>> granularity of 64K and use SVNAPOT to accelerate address translation.
>> This is the second version and the detailed introduction can be found
>> in [1].
>>
>> Changes from v1:
>> - Rebase on v6.12.
>>
>> - Adjust the page table entry shift to reduce page table memory usage.
>>      For example, in SV39, the traditional va behaves as:
>>
>>      ----------------------------------------------
>>      | pgd index | pmd index | pte index | offset |
>>      ----------------------------------------------
>>      | 38     30 | 29     21 | 20     12 | 11   0 |
>>      ----------------------------------------------
>>
>>      When we choose 64K as basic software page, va now behaves as:
>>
>>      ----------------------------------------------
>>      | pgd index | pmd index | pte index | offset |
>>      ----------------------------------------------
>>      | 38     34 | 33     25 | 24     16 | 15   0 |
>>      ----------------------------------------------
>>
>> - Fix some bugs in v1.
>>
>> Thanks in advance for comments.
>>
>> [1] https://lwn.net/Articles/952722/
> 
> This looks very interesting. Can you cc me and linux-mm@kvack.org
> in the future? Thanks.
> 
> Have you thought about doing it for ARM64 4KB as well? ARM64’s contig PTE
> should have similar effect of RISC-V’s SVNAPOT, right?

What is the real benefit over 4k + large folios/mTHP?

64K comes with the problem of internal fragmentation: for example, a 
page table that only occupies 4k of memory suddenly consumes 64K; quite 
a downside.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [External] Re: [RFC PATCH v2 00/21] riscv: Introduce 64K base page
  2024-12-06 10:13   ` David Hildenbrand
@ 2024-12-06 13:42     ` Xu Lu
  2024-12-06 18:48       ` Pedro Falcato
  0 siblings, 1 reply; 8+ messages in thread
From: Xu Lu @ 2024-12-06 13:42 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Zi Yan, paul.walmsley, palmer, aou, ardb, anup, atishp,
	xieyongji, lihangjing, punit.agrawal, linux-kernel, linux-riscv,
	Linux MM

[-- Attachment #1: Type: text/plain, Size: 4562 bytes --]

Hi David,

On Fri, Dec 6, 2024 at 6:13 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 06.12.24 03:00, Zi Yan wrote:
> > On 5 Dec 2024, at 5:37, Xu Lu wrote:
> >
> >> This patch series attempts to break through the limitation of MMU and
> >> supports larger base page on RISC-V, which only supports 4K page size
> >> now. The key idea is to always manage and allocate memory at a
> >> granularity of 64K and use SVNAPOT to accelerate address translation.
> >> This is the second version and the detailed introduction can be found
> >> in [1].
> >>
> >> Changes from v1:
> >> - Rebase on v6.12.
> >>
> >> - Adjust the page table entry shift to reduce page table memory usage.
> >>      For example, in SV39, the traditional va behaves as:
> >>
> >>      ----------------------------------------------
> >>      | pgd index | pmd index | pte index | offset |
> >>      ----------------------------------------------
> >>      | 38     30 | 29     21 | 20     12 | 11   0 |
> >>      ----------------------------------------------
> >>
> >>      When we choose 64K as basic software page, va now behaves as:
> >>
> >>      ----------------------------------------------
> >>      | pgd index | pmd index | pte index | offset |
> >>      ----------------------------------------------
> >>      | 38     34 | 33     25 | 24     16 | 15   0 |
> >>      ----------------------------------------------
> >>
> >> - Fix some bugs in v1.
> >>
> >> Thanks in advance for comments.
> >>
> >> [1] https://lwn.net/Articles/952722/
> >
> > This looks very interesting. Can you cc me and linux-mm@kvack.org
> > in the future? Thanks.
> >
> > Have you thought about doing it for ARM64 4KB as well? ARM64’s contig PTE
> > should have similar effect of RISC-V’s SVNAPOT, right?
>
> What is the real benefit over 4k + large folios/mTHP?
>
> 64K comes with the problem of internal fragmentation: for example, a
> page table that only occupies 4k of memory suddenly consumes 64K; quite
> a downside.

The original idea comes from the performance benefits we achieved on
the ARM 64K kernel. We run several real world applications on the ARM
Ampere Altra platform and found these apps' performance based on the
64K page kernel is significantly higher than that on the 4K page
kernel:
For Redis, the throughput has increased by 250% and latency has
decreased by 70%.
For Mysql, the throughput has increased by 16.9% and latency has
decreased by 14.5%.
For our own newsql database, throughput has increased by 16.5% and
latency has decreased by 13.8%.

Also, we have compared the performance between 64K and 4k + large
folios/mTHP on ARM Neoverse-N2. The result shows considerable
performance improvement on 64K kernel for both speccpu and lmbench,
even when 4K kernel enables THP and ARM64_CONTPTE:
For speccpu benchmark, 64K kernel without any huge pages optimization
can still achieve 4.17% higher score than 4K kernel with transparent
huge pages as well as CONTPTE optimization.
For lmbench, 64K kernel achieves 75.98% lower memory mapping
latency(16MB) than 4K kernel with transparent huge pages and CONTPTE
optimization, 84.34% higher map read open2close bandwidth(16MB), and
10.71% lower random load latency(16MB).
Interestingly, sometimes kernel with transparent pages support have
poorer performance for both 4K and 64K (for example, mmap read
bandwidth bench). We assume this is due to the overhead of huge pages'
combination and collapse.
Also, if you check the full result, you will find that usually the
larger the memory size used for testing is, the better the performance
of 64k kernel is (compared to 4K kernel). Unless the memory size lies
in a range where 4K kernel can apply 2MB huge pages while 64K kernel
can't.
In summary, for performance sensitive applications which require
higher bandwidth and lower latency, sometimes 4K pages with huge pages
may not be the best choice and 64k page can achieve better results.
The test environment and result is attached.

As RISC-V has no native 64K MMU support, we introduce a software
implementation and accelerate it via Svnapot. Of course, there will be
some extra overhead compared with native 64K MMU. Thus, we are also
trying to persuade the RISC-V community to support the extension of
native 64K MMU [1]. Please join us if you are interested.

[1] https://lists.riscv.org/g/tech-privileged/topic/query_about_risc_v_s_support/108641509

Best Regards,

Xu Lu

>
> --
> Cheers,
>
> David / dhildenb
>

[-- Attachment #2: ARM 64K Result.xlsx --]
[-- Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet, Size: 54752 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [External] Re: [RFC PATCH v2 00/21] riscv: Introduce 64K base page
  2024-12-06 13:42     ` [External] " Xu Lu
@ 2024-12-06 18:48       ` Pedro Falcato
  2024-12-07  8:03         ` Xu Lu
  0 siblings, 1 reply; 8+ messages in thread
From: Pedro Falcato @ 2024-12-06 18:48 UTC (permalink / raw)
  To: Xu Lu
  Cc: David Hildenbrand, Zi Yan, paul.walmsley, palmer, aou, ardb,
	anup, atishp, xieyongji, lihangjing, punit.agrawal, linux-kernel,
	linux-riscv, Linux MM

On Fri, Dec 6, 2024 at 1:42 PM Xu Lu <luxu.kernel@bytedance.com> wrote:
>
> Hi David,
>
> On Fri, Dec 6, 2024 at 6:13 PM David Hildenbrand <david@redhat.com> wrote:
> >
> > On 06.12.24 03:00, Zi Yan wrote:
> > > On 5 Dec 2024, at 5:37, Xu Lu wrote:
> > >
> > >> This patch series attempts to break through the limitation of MMU and
> > >> supports larger base page on RISC-V, which only supports 4K page size
> > >> now. The key idea is to always manage and allocate memory at a
> > >> granularity of 64K and use SVNAPOT to accelerate address translation.
> > >> This is the second version and the detailed introduction can be found
> > >> in [1].
> > >>
> > >> Changes from v1:
> > >> - Rebase on v6.12.
> > >>
> > >> - Adjust the page table entry shift to reduce page table memory usage.
> > >>      For example, in SV39, the traditional va behaves as:
> > >>
> > >>      ----------------------------------------------
> > >>      | pgd index | pmd index | pte index | offset |
> > >>      ----------------------------------------------
> > >>      | 38     30 | 29     21 | 20     12 | 11   0 |
> > >>      ----------------------------------------------
> > >>
> > >>      When we choose 64K as basic software page, va now behaves as:
> > >>
> > >>      ----------------------------------------------
> > >>      | pgd index | pmd index | pte index | offset |
> > >>      ----------------------------------------------
> > >>      | 38     34 | 33     25 | 24     16 | 15   0 |
> > >>      ----------------------------------------------
> > >>
> > >> - Fix some bugs in v1.
> > >>
> > >> Thanks in advance for comments.
> > >>
> > >> [1] https://lwn.net/Articles/952722/
> > >
> > > This looks very interesting. Can you cc me and linux-mm@kvack.org
> > > in the future? Thanks.
> > >
> > > Have you thought about doing it for ARM64 4KB as well? ARM64’s contig PTE
> > > should have similar effect of RISC-V’s SVNAPOT, right?
> >
> > What is the real benefit over 4k + large folios/mTHP?
> >
> > 64K comes with the problem of internal fragmentation: for example, a
> > page table that only occupies 4k of memory suddenly consumes 64K; quite
> > a downside.
>
> The original idea comes from the performance benefits we achieved on
> the ARM 64K kernel. We run several real world applications on the ARM
> Ampere Altra platform and found these apps' performance based on the
> 64K page kernel is significantly higher than that on the 4K page
> kernel:
> For Redis, the throughput has increased by 250% and latency has
> decreased by 70%.
> For Mysql, the throughput has increased by 16.9% and latency has
> decreased by 14.5%.
> For our own newsql database, throughput has increased by 16.5% and
> latency has decreased by 13.8%.
>
> Also, we have compared the performance between 64K and 4k + large
> folios/mTHP on ARM Neoverse-N2. The result shows considerable
> performance improvement on 64K kernel for both speccpu and lmbench,
> even when 4K kernel enables THP and ARM64_CONTPTE:
> For speccpu benchmark, 64K kernel without any huge pages optimization
> can still achieve 4.17% higher score than 4K kernel with transparent
> huge pages as well as CONTPTE optimization.
> For lmbench, 64K kernel achieves 75.98% lower memory mapping
> latency(16MB) than 4K kernel with transparent huge pages and CONTPTE
> optimization, 84.34% higher map read open2close bandwidth(16MB), and
> 10.71% lower random load latency(16MB).
> Interestingly, sometimes kernel with transparent pages support have
> poorer performance for both 4K and 64K (for example, mmap read
> bandwidth bench). We assume this is due to the overhead of huge pages'
> combination and collapse.
> Also, if you check the full result, you will find that usually the
> larger the memory size used for testing is, the better the performance
> of 64k kernel is (compared to 4K kernel). Unless the memory size lies
> in a range where 4K kernel can apply 2MB huge pages while 64K kernel
> can't.
> In summary, for performance sensitive applications which require
> higher bandwidth and lower latency, sometimes 4K pages with huge pages
> may not be the best choice and 64k page can achieve better results.
> The test environment and result is attached.
>
> As RISC-V has no native 64K MMU support, we introduce a software
> implementation and accelerate it via Svnapot. Of course, there will be
> some extra overhead compared with native 64K MMU. Thus, we are also
> trying to persuade the RISC-V community to support the extension of
> native 64K MMU [1]. Please join us if you are interested.
>

Ok, so you... didn't test this on riscv? And you're basing this
patchset off of a native 64KiB page size kernel being faster than 4KiB
+ CONTPTE? I don't see how that makes sense?

/me is confused

How many of these PAGE_SIZE wins are related to e.g userspace basing
its buffer sizes (or whatever) off of the system page size? Where
exactly are you gaining time versus the CONTPTE stuff?
I think MM in general would be better off if we were more transparent
with regard to CONTPTE and page sizes instead of hand waving with
"hardware page size != software page size", which is such a *checks
notes* 4.4BSD idea... :) At the very least, this patchset seems to go
against all the work on better supporting large folios and CONTPTE.

-- 
Pedro


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [External] Re: [RFC PATCH v2 00/21] riscv: Introduce 64K base page
  2024-12-06 18:48       ` Pedro Falcato
@ 2024-12-07  8:03         ` Xu Lu
  2024-12-07 22:02           ` Yu Zhao
  0 siblings, 1 reply; 8+ messages in thread
From: Xu Lu @ 2024-12-07  8:03 UTC (permalink / raw)
  To: Pedro Falcato
  Cc: David Hildenbrand, Zi Yan, paul.walmsley, palmer, aou, ardb,
	anup, atishp, xieyongji, lihangjing, punit.agrawal, linux-kernel,
	linux-riscv, Linux MM

Hi Pedro,

On Sat, Dec 7, 2024 at 2:49 AM Pedro Falcato <pedro.falcato@gmail.com> wrote:
>
> On Fri, Dec 6, 2024 at 1:42 PM Xu Lu <luxu.kernel@bytedance.com> wrote:
> >
> > Hi David,
> >
> > On Fri, Dec 6, 2024 at 6:13 PM David Hildenbrand <david@redhat.com> wrote:
> > >
> > > On 06.12.24 03:00, Zi Yan wrote:
> > > > On 5 Dec 2024, at 5:37, Xu Lu wrote:
> > > >
> > > >> This patch series attempts to break through the limitation of MMU and
> > > >> supports larger base page on RISC-V, which only supports 4K page size
> > > >> now. The key idea is to always manage and allocate memory at a
> > > >> granularity of 64K and use SVNAPOT to accelerate address translation.
> > > >> This is the second version and the detailed introduction can be found
> > > >> in [1].
> > > >>
> > > >> Changes from v1:
> > > >> - Rebase on v6.12.
> > > >>
> > > >> - Adjust the page table entry shift to reduce page table memory usage.
> > > >>      For example, in SV39, the traditional va behaves as:
> > > >>
> > > >>      ----------------------------------------------
> > > >>      | pgd index | pmd index | pte index | offset |
> > > >>      ----------------------------------------------
> > > >>      | 38     30 | 29     21 | 20     12 | 11   0 |
> > > >>      ----------------------------------------------
> > > >>
> > > >>      When we choose 64K as basic software page, va now behaves as:
> > > >>
> > > >>      ----------------------------------------------
> > > >>      | pgd index | pmd index | pte index | offset |
> > > >>      ----------------------------------------------
> > > >>      | 38     34 | 33     25 | 24     16 | 15   0 |
> > > >>      ----------------------------------------------
> > > >>
> > > >> - Fix some bugs in v1.
> > > >>
> > > >> Thanks in advance for comments.
> > > >>
> > > >> [1] https://lwn.net/Articles/952722/
> > > >
> > > > This looks very interesting. Can you cc me and linux-mm@kvack.org
> > > > in the future? Thanks.
> > > >
> > > > Have you thought about doing it for ARM64 4KB as well? ARM64’s contig PTE
> > > > should have similar effect of RISC-V’s SVNAPOT, right?
> > >
> > > What is the real benefit over 4k + large folios/mTHP?
> > >
> > > 64K comes with the problem of internal fragmentation: for example, a
> > > page table that only occupies 4k of memory suddenly consumes 64K; quite
> > > a downside.
> >
> > The original idea comes from the performance benefits we achieved on
> > the ARM 64K kernel. We run several real world applications on the ARM
> > Ampere Altra platform and found these apps' performance based on the
> > 64K page kernel is significantly higher than that on the 4K page
> > kernel:
> > For Redis, the throughput has increased by 250% and latency has
> > decreased by 70%.
> > For Mysql, the throughput has increased by 16.9% and latency has
> > decreased by 14.5%.
> > For our own newsql database, throughput has increased by 16.5% and
> > latency has decreased by 13.8%.
> >
> > Also, we have compared the performance between 64K and 4k + large
> > folios/mTHP on ARM Neoverse-N2. The result shows considerable
> > performance improvement on 64K kernel for both speccpu and lmbench,
> > even when 4K kernel enables THP and ARM64_CONTPTE:
> > For speccpu benchmark, 64K kernel without any huge pages optimization
> > can still achieve 4.17% higher score than 4K kernel with transparent
> > huge pages as well as CONTPTE optimization.
> > For lmbench, 64K kernel achieves 75.98% lower memory mapping
> > latency(16MB) than 4K kernel with transparent huge pages and CONTPTE
> > optimization, 84.34% higher map read open2close bandwidth(16MB), and
> > 10.71% lower random load latency(16MB).
> > Interestingly, sometimes kernel with transparent pages support have
> > poorer performance for both 4K and 64K (for example, mmap read
> > bandwidth bench). We assume this is due to the overhead of huge pages'
> > combination and collapse.
> > Also, if you check the full result, you will find that usually the
> > larger the memory size used for testing is, the better the performance
> > of 64k kernel is (compared to 4K kernel). Unless the memory size lies
> > in a range where 4K kernel can apply 2MB huge pages while 64K kernel
> > can't.
> > In summary, for performance sensitive applications which require
> > higher bandwidth and lower latency, sometimes 4K pages with huge pages
> > may not be the best choice and 64k page can achieve better results.
> > The test environment and result is attached.
> >
> > As RISC-V has no native 64K MMU support, we introduce a software
> > implementation and accelerate it via Svnapot. Of course, there will be
> > some extra overhead compared with native 64K MMU. Thus, we are also
> > trying to persuade the RISC-V community to support the extension of
> > native 64K MMU [1]. Please join us if you are interested.
> >
>
> Ok, so you... didn't test this on riscv? And you're basing this
> patchset off of a native 64KiB page size kernel being faster than 4KiB
> + CONTPTE? I don't see how that makes sense?

Sorry for the misleading. I didn't intend to use ARM data to support
this patch, just to explain the idea source. We do prefer 64K MMU for
the performance improvement it brought to real applications and
benchmarks. And since RISC-V does not support it yet, we internally
use this patch as a transitional solution for RISC-V. And if native
64k MMU is available, this patch can be canceled. The only usage of
this patch I can think of then is to make the kernel support more page
sizes than MMU, as long as Svnapot supports the corresponding size.

We will try to release the performance data in the next version. There
have been more issues with applications and OS adaptation:) So this
version is still an RFC.

>
> /me is confused
>
> How many of these PAGE_SIZE wins are related to e.g userspace basing
> its buffer sizes (or whatever) off of the system page size? Where
> exactly are you gaining time versus the CONTPTE stuff?
> I think MM in general would be better off if we were more transparent
> with regard to CONTPTE and page sizes instead of hand waving with
> "hardware page size != software page size", which is such a *checks
> notes* 4.4BSD idea... :) At the very least, this patchset seems to go
> against all the work on better supporting large folios and CONTPTE.

By the way, the core modification of this patch is turning pte
structure to an array of 16 entries to map a 64K page and accelerating
it via Svnapot. I think it is all about architectural pte and has
little impact on pages or folios. Please remind me if anything is
missed and I will try to fix it.

>
> --
> Pedro

Thanks,

Xu Lu


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [External] Re: [RFC PATCH v2 00/21] riscv: Introduce 64K base page
  2024-12-07  8:03         ` Xu Lu
@ 2024-12-07 22:02           ` Yu Zhao
  2024-12-09  3:36             ` Xu Lu
  0 siblings, 1 reply; 8+ messages in thread
From: Yu Zhao @ 2024-12-07 22:02 UTC (permalink / raw)
  To: Xu Lu
  Cc: Pedro Falcato, David Hildenbrand, Zi Yan, paul.walmsley, palmer,
	aou, ardb, anup, atishp, xieyongji, lihangjing, punit.agrawal,
	linux-kernel, linux-riscv, Linux MM

On Sat, Dec 7, 2024 at 1:03 AM Xu Lu <luxu.kernel@bytedance.com> wrote:
>
> Hi Pedro,
>
> On Sat, Dec 7, 2024 at 2:49 AM Pedro Falcato <pedro.falcato@gmail.com> wrote:
> >
> > On Fri, Dec 6, 2024 at 1:42 PM Xu Lu <luxu.kernel@bytedance.com> wrote:
> > >
> > > Hi David,
> > >
> > > On Fri, Dec 6, 2024 at 6:13 PM David Hildenbrand <david@redhat.com> wrote:
> > > >
> > > > On 06.12.24 03:00, Zi Yan wrote:
> > > > > On 5 Dec 2024, at 5:37, Xu Lu wrote:
> > > > >
> > > > >> This patch series attempts to break through the limitation of MMU and
> > > > >> supports larger base page on RISC-V, which only supports 4K page size
> > > > >> now. The key idea is to always manage and allocate memory at a
> > > > >> granularity of 64K and use SVNAPOT to accelerate address translation.
> > > > >> This is the second version and the detailed introduction can be found
> > > > >> in [1].
> > > > >>
> > > > >> Changes from v1:
> > > > >> - Rebase on v6.12.
> > > > >>
> > > > >> - Adjust the page table entry shift to reduce page table memory usage.
> > > > >>      For example, in SV39, the traditional va behaves as:
> > > > >>
> > > > >>      ----------------------------------------------
> > > > >>      | pgd index | pmd index | pte index | offset |
> > > > >>      ----------------------------------------------
> > > > >>      | 38     30 | 29     21 | 20     12 | 11   0 |
> > > > >>      ----------------------------------------------
> > > > >>
> > > > >>      When we choose 64K as basic software page, va now behaves as:
> > > > >>
> > > > >>      ----------------------------------------------
> > > > >>      | pgd index | pmd index | pte index | offset |
> > > > >>      ----------------------------------------------
> > > > >>      | 38     34 | 33     25 | 24     16 | 15   0 |
> > > > >>      ----------------------------------------------
> > > > >>
> > > > >> - Fix some bugs in v1.
> > > > >>
> > > > >> Thanks in advance for comments.
> > > > >>
> > > > >> [1] https://lwn.net/Articles/952722/
> > > > >
> > > > > This looks very interesting. Can you cc me and linux-mm@kvack.org
> > > > > in the future? Thanks.
> > > > >
> > > > > Have you thought about doing it for ARM64 4KB as well? ARM64’s contig PTE
> > > > > should have similar effect of RISC-V’s SVNAPOT, right?
> > > >
> > > > What is the real benefit over 4k + large folios/mTHP?
> > > >
> > > > 64K comes with the problem of internal fragmentation: for example, a
> > > > page table that only occupies 4k of memory suddenly consumes 64K; quite
> > > > a downside.
> > >
> > > The original idea comes from the performance benefits we achieved on
> > > the ARM 64K kernel. We run several real world applications on the ARM
> > > Ampere Altra platform and found these apps' performance based on the
> > > 64K page kernel is significantly higher than that on the 4K page
> > > kernel:
> > > For Redis, the throughput has increased by 250% and latency has
> > > decreased by 70%.
> > > For Mysql, the throughput has increased by 16.9% and latency has
> > > decreased by 14.5%.
> > > For our own newsql database, throughput has increased by 16.5% and
> > > latency has decreased by 13.8%.
> > >
> > > Also, we have compared the performance between 64K and 4k + large
> > > folios/mTHP on ARM Neoverse-N2. The result shows considerable
> > > performance improvement on 64K kernel for both speccpu and lmbench,
> > > even when 4K kernel enables THP and ARM64_CONTPTE:
> > > For speccpu benchmark, 64K kernel without any huge pages optimization
> > > can still achieve 4.17% higher score than 4K kernel with transparent
> > > huge pages as well as CONTPTE optimization.
> > > For lmbench, 64K kernel achieves 75.98% lower memory mapping
> > > latency(16MB) than 4K kernel with transparent huge pages and CONTPTE
> > > optimization, 84.34% higher map read open2close bandwidth(16MB), and
> > > 10.71% lower random load latency(16MB).
> > > Interestingly, sometimes kernel with transparent pages support have
> > > poorer performance for both 4K and 64K (for example, mmap read
> > > bandwidth bench). We assume this is due to the overhead of huge pages'
> > > combination and collapse.
> > > Also, if you check the full result, you will find that usually the
> > > larger the memory size used for testing is, the better the performance
> > > of 64k kernel is (compared to 4K kernel). Unless the memory size lies
> > > in a range where 4K kernel can apply 2MB huge pages while 64K kernel
> > > can't.
> > > In summary, for performance sensitive applications which require
> > > higher bandwidth and lower latency, sometimes 4K pages with huge pages
> > > may not be the best choice and 64k page can achieve better results.
> > > The test environment and result is attached.
> > >
> > > As RISC-V has no native 64K MMU support, we introduce a software
> > > implementation and accelerate it via Svnapot. Of course, there will be
> > > some extra overhead compared with native 64K MMU. Thus, we are also
> > > trying to persuade the RISC-V community to support the extension of
> > > native 64K MMU [1]. Please join us if you are interested.
> > >
> >
> > Ok, so you... didn't test this on riscv? And you're basing this
> > patchset off of a native 64KiB page size kernel being faster than 4KiB
> > + CONTPTE? I don't see how that makes sense?
>
> Sorry for the misleading. I didn't intend to use ARM data to support
> this patch, just to explain the idea source. We do prefer 64K MMU for
> the performance improvement it brought to real applications and
> benchmarks.

This breaks ABI, doesn't it? Not only userspace needs to be recompiled
with 64KB alignment, it also needs not to assume 4KB base page size.

> And since RISC-V does not support it yet, we internally
> use this patch as a transitional solution for RISC-V.

Distros need to support this as well. Otherwise it's a tech island.
Also why RV? It can be a generic feature which can apply to other
archs like x86, right? See "page clustering" [1][2].

[1] https://lwn.net/Articles/23785/
[2] https://lore.kernel.org/linux-mm/Pine.LNX.4.21.0107051737340.1577-100000@localhost.localdomain/

> And if native
> 64k MMU is available, this patch can be canceled.

Why 64KB? Why not 32KB or 128KB? In general, the less dependency on
h/w, the better. Ideally, *if* we want to consider this, it should be
a s/w feature applicable to all (or most of) archs.


> The only usage of
> this patch I can think of then is to make the kernel support more page
> sizes than MMU, as long as Svnapot supports the corresponding size.
>
> We will try to release the performance data in the next version. There
> have been more issues with applications and OS adaptation:) So this
> version is still an RFC.
>
> >
> > /me is confused
> >
> > How many of these PAGE_SIZE wins are related to e.g userspace basing
> > its buffer sizes (or whatever) off of the system page size? Where
> > exactly are you gaining time versus the CONTPTE stuff?
> > I think MM in general would be better off if we were more transparent
> > with regard to CONTPTE and page sizes instead of hand waving with
> > "hardware page size != software page size", which is such a *checks
> > notes* 4.4BSD idea... :) At the very least, this patchset seems to go
> > against all the work on better supporting large folios and CONTPTE.
>
> By the way, the core modification of this patch is turning pte
> structure to an array of 16 entries to map a 64K page and accelerating
> it via Svnapot. I think it is all about architectural pte and has
> little impact on pages or folios. Please remind me if anything is
> missed and I will try to fix it.
>
> >
> > --
> > Pedro
>
> Thanks,
>
> Xu Lu
>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [External] Re: [RFC PATCH v2 00/21] riscv: Introduce 64K base page
  2024-12-07 22:02           ` Yu Zhao
@ 2024-12-09  3:36             ` Xu Lu
  0 siblings, 0 replies; 8+ messages in thread
From: Xu Lu @ 2024-12-09  3:36 UTC (permalink / raw)
  To: Yu Zhao
  Cc: Pedro Falcato, David Hildenbrand, Zi Yan, paul.walmsley, palmer,
	aou, ardb, anup, atishp, xieyongji, lihangjing, punit.agrawal,
	linux-kernel, linux-riscv, Linux MM

Hi Yu Zhao,

On Sun, Dec 8, 2024 at 6:03 AM Yu Zhao <yuzhao@google.com> wrote:
>
> On Sat, Dec 7, 2024 at 1:03 AM Xu Lu <luxu.kernel@bytedance.com> wrote:
> >
> > Hi Pedro,
> >
> > On Sat, Dec 7, 2024 at 2:49 AM Pedro Falcato <pedro.falcato@gmail.com> wrote:
> > >
> > > On Fri, Dec 6, 2024 at 1:42 PM Xu Lu <luxu.kernel@bytedance.com> wrote:
> > > >
> > > > Hi David,
> > > >
> > > > On Fri, Dec 6, 2024 at 6:13 PM David Hildenbrand <david@redhat.com> wrote:
> > > > >
> > > > > On 06.12.24 03:00, Zi Yan wrote:
> > > > > > On 5 Dec 2024, at 5:37, Xu Lu wrote:
> > > > > >
> > > > > >> This patch series attempts to break through the limitation of MMU and
> > > > > >> supports larger base page on RISC-V, which only supports 4K page size
> > > > > >> now. The key idea is to always manage and allocate memory at a
> > > > > >> granularity of 64K and use SVNAPOT to accelerate address translation.
> > > > > >> This is the second version and the detailed introduction can be found
> > > > > >> in [1].
> > > > > >>
> > > > > >> Changes from v1:
> > > > > >> - Rebase on v6.12.
> > > > > >>
> > > > > >> - Adjust the page table entry shift to reduce page table memory usage.
> > > > > >>      For example, in SV39, the traditional va behaves as:
> > > > > >>
> > > > > >>      ----------------------------------------------
> > > > > >>      | pgd index | pmd index | pte index | offset |
> > > > > >>      ----------------------------------------------
> > > > > >>      | 38     30 | 29     21 | 20     12 | 11   0 |
> > > > > >>      ----------------------------------------------
> > > > > >>
> > > > > >>      When we choose 64K as basic software page, va now behaves as:
> > > > > >>
> > > > > >>      ----------------------------------------------
> > > > > >>      | pgd index | pmd index | pte index | offset |
> > > > > >>      ----------------------------------------------
> > > > > >>      | 38     34 | 33     25 | 24     16 | 15   0 |
> > > > > >>      ----------------------------------------------
> > > > > >>
> > > > > >> - Fix some bugs in v1.
> > > > > >>
> > > > > >> Thanks in advance for comments.
> > > > > >>
> > > > > >> [1] https://lwn.net/Articles/952722/
> > > > > >
> > > > > > This looks very interesting. Can you cc me and linux-mm@kvack.org
> > > > > > in the future? Thanks.
> > > > > >
> > > > > > Have you thought about doing it for ARM64 4KB as well? ARM64’s contig PTE
> > > > > > should have similar effect of RISC-V’s SVNAPOT, right?
> > > > >
> > > > > What is the real benefit over 4k + large folios/mTHP?
> > > > >
> > > > > 64K comes with the problem of internal fragmentation: for example, a
> > > > > page table that only occupies 4k of memory suddenly consumes 64K; quite
> > > > > a downside.
> > > >
> > > > The original idea comes from the performance benefits we achieved on
> > > > the ARM 64K kernel. We run several real world applications on the ARM
> > > > Ampere Altra platform and found these apps' performance based on the
> > > > 64K page kernel is significantly higher than that on the 4K page
> > > > kernel:
> > > > For Redis, the throughput has increased by 250% and latency has
> > > > decreased by 70%.
> > > > For Mysql, the throughput has increased by 16.9% and latency has
> > > > decreased by 14.5%.
> > > > For our own newsql database, throughput has increased by 16.5% and
> > > > latency has decreased by 13.8%.
> > > >
> > > > Also, we have compared the performance between 64K and 4k + large
> > > > folios/mTHP on ARM Neoverse-N2. The result shows considerable
> > > > performance improvement on 64K kernel for both speccpu and lmbench,
> > > > even when 4K kernel enables THP and ARM64_CONTPTE:
> > > > For speccpu benchmark, 64K kernel without any huge pages optimization
> > > > can still achieve 4.17% higher score than 4K kernel with transparent
> > > > huge pages as well as CONTPTE optimization.
> > > > For lmbench, 64K kernel achieves 75.98% lower memory mapping
> > > > latency(16MB) than 4K kernel with transparent huge pages and CONTPTE
> > > > optimization, 84.34% higher map read open2close bandwidth(16MB), and
> > > > 10.71% lower random load latency(16MB).
> > > > Interestingly, sometimes kernel with transparent pages support have
> > > > poorer performance for both 4K and 64K (for example, mmap read
> > > > bandwidth bench). We assume this is due to the overhead of huge pages'
> > > > combination and collapse.
> > > > Also, if you check the full result, you will find that usually the
> > > > larger the memory size used for testing is, the better the performance
> > > > of 64k kernel is (compared to 4K kernel). Unless the memory size lies
> > > > in a range where 4K kernel can apply 2MB huge pages while 64K kernel
> > > > can't.
> > > > In summary, for performance sensitive applications which require
> > > > higher bandwidth and lower latency, sometimes 4K pages with huge pages
> > > > may not be the best choice and 64k page can achieve better results.
> > > > The test environment and result is attached.
> > > >
> > > > As RISC-V has no native 64K MMU support, we introduce a software
> > > > implementation and accelerate it via Svnapot. Of course, there will be
> > > > some extra overhead compared with native 64K MMU. Thus, we are also
> > > > trying to persuade the RISC-V community to support the extension of
> > > > native 64K MMU [1]. Please join us if you are interested.
> > > >
> > >
> > > Ok, so you... didn't test this on riscv? And you're basing this
> > > patchset off of a native 64KiB page size kernel being faster than 4KiB
> > > + CONTPTE? I don't see how that makes sense?
> >
> > Sorry for the misleading. I didn't intend to use ARM data to support
> > this patch, just to explain the idea source. We do prefer 64K MMU for
> > the performance improvement it brought to real applications and
> > benchmarks.
>
> This breaks ABI, doesn't it? Not only userspace needs to be recompiled
> with 64KB alignment, it also needs not to assume 4KB base page size.

Yes, it does.

>
> > And since RISC-V does not support it yet, we internally
> > use this patch as a transitional solution for RISC-V.
>
> Distros need to support this as well. Otherwise it's a tech island.
> Also why RV? It can be a generic feature which can apply to other
> archs like x86, right? See "page clustering" [1][2].
>
> [1] https://lwn.net/Articles/23785/
> [2] https://lore.kernel.org/linux-mm/Pine.LNX.4.21.0107051737340.1577-100000@localhost.localdomain/
>
> > And if native
> > 64k MMU is available, this patch can be canceled.
>
> Why 64KB? Why not 32KB or 128KB? In general, the less dependency on
> h/w, the better. Ideally, *if* we want to consider this, it should be
> a s/w feature applicable to all (or most of) archs.

We chose RISC-V because of internal business needs, and chose 64k
because of the benefits we have achieved on ARM 64k.

It is a pretty ambitious goal to apply such a feature to all
architectures. We are very glad to do so and request more assistance
if everyone thinks it is better. But for now, perhaps it is a better
choice to try it on RV first? After all, not all architectures support
features like Svnapot or CONTPTE. Of course, for architectures not
supporting Svnapot, applying a bigger page size can still achieve less
metadata memory overhead and less page faults.

We are pleased to see that similar things have already been considered
before. We give the most respect to William Lee Irwin and Hugh Dickins
and hope they can continue on this work. We will cc them in the future
emails.

Best Regards,

Xu Lu

>
>
> > The only usage of
> > this patch I can think of then is to make the kernel support more page
> > sizes than MMU, as long as Svnapot supports the corresponding size.
> >
> > We will try to release the performance data in the next version. There
> > have been more issues with applications and OS adaptation:) So this
> > version is still an RFC.
> >
> > >
> > > /me is confused
> > >
> > > How many of these PAGE_SIZE wins are related to e.g userspace basing
> > > its buffer sizes (or whatever) off of the system page size? Where
> > > exactly are you gaining time versus the CONTPTE stuff?
> > > I think MM in general would be better off if we were more transparent
> > > with regard to CONTPTE and page sizes instead of hand waving with
> > > "hardware page size != software page size", which is such a *checks
> > > notes* 4.4BSD idea... :) At the very least, this patchset seems to go
> > > against all the work on better supporting large folios and CONTPTE.
> >
> > By the way, the core modification of this patch is turning pte
> > structure to an array of 16 entries to map a 64K page and accelerating
> > it via Svnapot. I think it is all about architectural pte and has
> > little impact on pages or folios. Please remind me if anything is
> > missed and I will try to fix it.
> >
> > >
> > > --
> > > Pedro
> >
> > Thanks,
> >
> > Xu Lu
> >


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-12-09  3:36 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20241205103729.14798-1-luxu.kernel@bytedance.com>
2024-12-06  2:00 ` [RFC PATCH v2 00/21] riscv: Introduce 64K base page Zi Yan
2024-12-06  2:41   ` [External] " Xu Lu
2024-12-06 10:13   ` David Hildenbrand
2024-12-06 13:42     ` [External] " Xu Lu
2024-12-06 18:48       ` Pedro Falcato
2024-12-07  8:03         ` Xu Lu
2024-12-07 22:02           ` Yu Zhao
2024-12-09  3:36             ` Xu Lu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox