Re: [PATCH v3 00/15] Transparent Contiguous PTEs for User Mappings

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: John Hubbard <jhubbard@nvidia.com>
To: Ryan Roberts <ryan.roberts@arm.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>, Ard Biesheuvel <ardb@kernel.org>,
	Marc Zyngier <maz@kernel.org>,
	Oliver Upton <oliver.upton@linux.dev>,
	James Morse <james.morse@arm.com>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	Zenghui Yu <yuzenghui@huawei.com>,
	Andrey Ryabinin <ryabinin.a.a@gmail.com>,
	Alexander Potapenko <glider@google.com>,
	"Andrey Konovalov" <andreyknvl@gmail.com>,
	Dmitry Vyukov <dvyukov@google.com>,
	Vincenzo Frascino <vincenzo.frascino@arm.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Anshuman Khandual <anshuman.khandual@arm.com>,
	Matthew Wilcox <willy@infradead.org>, Yu Zhao <yuzhao@google.com>,
	"Mark Rutland" <mark.rutland@arm.com>,
	David Hildenbrand <david@redhat.com>,
	"Kefeng Wang" <wangkefeng.wang@huawei.com>,
	Zi Yan <ziy@nvidia.com>, Barry Song <21cnbao@gmail.com>,
	Alistair Popple <apopple@nvidia.com>,
	Yang Shi <shy828301@gmail.com>
Cc: <linux-arm-kernel@lists.infradead.org>, <linux-mm@kvack.org>,
	<linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v3 00/15] Transparent Contiguous PTEs for User Mappings
Date: Mon, 4 Dec 2023 19:41:02 -0800	[thread overview]
Message-ID: <ea345a14-0a39-425c-a2df-d163ca948f57@nvidia.com> (raw)
In-Reply-To: <20231204105440.61448-1-ryan.roberts@arm.com>

On 12/4/23 02:54, Ryan Roberts wrote:
> Hi All,
> 
> This is v3 of a series to opportunistically and transparently use contpte
> mappings (set the contiguous bit in ptes) for user memory when those mappings
> meet the requirements. It is part of a wider effort to improve performance by
> allocating and mapping variable-sized blocks of memory (folios). One aim is for
> the 4K kernel to approach the performance of the 16K kernel, but without
> breaking compatibility and without the associated increase in memory. Another
> aim is to benefit the 16K and 64K kernels by enabling 2M THP, since this is the
> contpte size for those kernels. We have good performance data that demonstrates
> both aims are being met (see below).
> 
> Of course this is only one half of the change. We require the mapped physical
> memory to be the correct size and alignment for this to actually be useful (i.e.
> 64K for 4K pages, or 2M for 16K/64K pages). Fortunately folios are solving this
> problem for us. Filesystems that support it (XFS, AFS, EROFS, tmpfs, ...) will
> allocate large folios up to the PMD size today, and more filesystems are coming.
> And the other half of my work, to enable "multi-size THP" (large folios) for
> anonymous memory, makes contpte sized folios prevalent for anonymous memory too
> [3].
> 

Hi Ryan,

Using a couple of Armv8 systems, I've tested this patchset. Details are in my
reply to the mTHP patchset [1].

So for this patchset, please feel free to add:

Tested-by: John Hubbard <jhubbard@nvidia.com>


[1] https://lore.kernel.org/all/2be046e1-ef95-4244-ae23-e56071ae1218@nvidia.com/


thanks,
-- 
John Hubbard
NVIDIA

> Optimistically, I would really like to get this series merged for v6.8; there is
> a chance that the multi-size THP series will also get merged for that version
> (although at this point pretty small). But even if it doesn't, this series still
> benefits file-backed memory from the file systems that support large folios so
> shouldn't be held up for it. Additionally I've got data that shows this series
> adds no regression when the system has no appropriate large folios.
> 
> All dependecies listed against v1 are now resolved; This series applies cleanly
> against v6.7-rc1.
> 
> Note that the first two patchs are for core-mm and provides the refactoring to
> make some crucial optimizations possible - which are then implemented in patches
> 14 and 15. The remaining patches are arm64-specific.
> 
> Testing
> =======
> 
> I've tested this series together with multi-size THP [3] on both Ampere Altra
> (bare metal) and Apple M2 (VM):
>    - mm selftests (inc new tests written for multi-size THP); no regressions
>    - Speedometer Java script benchmark in Chromium web browser; no issues
>    - Kernel compilation; no issues
>    - Various tests under high memory pressure with swap enabled; no issues
> 
> 
> Performance
> ===========
> 
> John Hubbard at Nvidia has indicated dramatic 10x performance improvements for
> some workloads at [4], when using 64K base page kernel.
> 
> You can also see the original performance results I posted against v1 [1] which
> are still valid.
> 
> I've additionally run the kernel compilation and speedometer benchmarks on a
> system with multi-size THP disabled and large folio support for file-backed
> memory intentionally disabled; I see no change in performance in this case (i.e.
> no regression when this change is "present but not useful").
> 
> 
> Changes since v2 [2]
> ====================
> 
>    - Removed contpte_ptep_get_and_clear_full() optimisation for exit() (v2#14),
>      and replaced with a batch-clearing approach using a new arch helper,
>      clear_ptes() (v3#2 and v3#15) (Alistair and Barry)
>    - (v2#1 / v3#1)
>        - Fixed folio refcounting so that refcount >= mapcount always (DavidH)
>        - Reworked batch demarcation to avoid pte_pgprot() (DavidH)
>        - Reverted return semantic of copy_present_page() and instead fix it up in
>          copy_present_ptes() (Alistair)
>        - Removed page_cont_mapped_vaddr() and replaced with simpler logic
>          (Alistair)
>        - Made batch accounting clearer in copy_pte_range() (Alistair)
>    - (v2#12 / v3#13)
>        - Renamed contpte_fold() -> contpte_convert() and hoisted setting/
>          clearing CONT_PTE bit to higher level (Alistair)
> 
> 
> Changes since v1 [1]
> ====================
> 
>    - Export contpte_* symbols so that modules can continue to call inline
>      functions (e.g. ptep_get) which may now call the contpte_* functions (thanks
>      to JohnH)
>    - Use pte_valid() instead of pte_present() where sensible (thanks to Catalin)
>    - Factor out (pte_valid() && pte_cont()) into new pte_valid_cont() helper
>      (thanks to Catalin)
>    - Fixed bug in contpte_ptep_set_access_flags() where TLBIs were missed (thanks
>      to Catalin)
>    - Added ARM64_CONTPTE expert Kconfig (enabled by default) (thanks to Anshuman)
>    - Simplified contpte_ptep_get_and_clear_full()
>    - Improved various code comments
> 
> 
> [1] https://lore.kernel.org/linux-arm-kernel/20230622144210.2623299-1-ryan.roberts@arm.com/
> [2] https://lore.kernel.org/linux-arm-kernel/20231115163018.1303287-1-ryan.roberts@arm.com/
> [3] https://lore.kernel.org/linux-arm-kernel/20231204102027.57185-1-ryan.roberts@arm.com/
> [4] https://lore.kernel.org/linux-mm/c507308d-bdd4-5f9e-d4ff-e96e4520be85@nvidia.com/
> 
> 
> Thanks,
> Ryan
> 
> Ryan Roberts (15):
>    mm: Batch-copy PTE ranges during fork()
>    mm: Batch-clear PTE ranges during zap_pte_range()
>    arm64/mm: set_pte(): New layer to manage contig bit
>    arm64/mm: set_ptes()/set_pte_at(): New layer to manage contig bit
>    arm64/mm: pte_clear(): New layer to manage contig bit
>    arm64/mm: ptep_get_and_clear(): New layer to manage contig bit
>    arm64/mm: ptep_test_and_clear_young(): New layer to manage contig bit
>    arm64/mm: ptep_clear_flush_young(): New layer to manage contig bit
>    arm64/mm: ptep_set_wrprotect(): New layer to manage contig bit
>    arm64/mm: ptep_set_access_flags(): New layer to manage contig bit
>    arm64/mm: ptep_get(): New layer to manage contig bit
>    arm64/mm: Split __flush_tlb_range() to elide trailing DSB
>    arm64/mm: Wire up PTE_CONT for user mappings
>    arm64/mm: Implement ptep_set_wrprotects() to optimize fork()
>    arm64/mm: Implement clear_ptes() to optimize exit()
> 
>   arch/arm64/Kconfig                |  10 +-
>   arch/arm64/include/asm/pgtable.h  | 343 ++++++++++++++++++++---
>   arch/arm64/include/asm/tlbflush.h |  13 +-
>   arch/arm64/kernel/efi.c           |   4 +-
>   arch/arm64/kernel/mte.c           |   2 +-
>   arch/arm64/kvm/guest.c            |   2 +-
>   arch/arm64/mm/Makefile            |   1 +
>   arch/arm64/mm/contpte.c           | 436 ++++++++++++++++++++++++++++++
>   arch/arm64/mm/fault.c             |  12 +-
>   arch/arm64/mm/fixmap.c            |   4 +-
>   arch/arm64/mm/hugetlbpage.c       |  40 +--
>   arch/arm64/mm/kasan_init.c        |   6 +-
>   arch/arm64/mm/mmu.c               |  16 +-
>   arch/arm64/mm/pageattr.c          |   6 +-
>   arch/arm64/mm/trans_pgd.c         |   6 +-
>   include/asm-generic/tlb.h         |   9 +
>   include/linux/pgtable.h           |  39 +++
>   mm/memory.c                       | 258 +++++++++++++-----
>   mm/mmu_gather.c                   |  14 +
>   19 files changed, 1067 insertions(+), 154 deletions(-)
>   create mode 100644 arch/arm64/mm/contpte.c
> 
> --
> 2.25.1
>

     prev parent reply	other threads:[~2023-12-05  3:41 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-04 10:54 Ryan Roberts
2023-12-04 10:54 ` [PATCH v3 01/15] mm: Batch-copy PTE ranges during fork() Ryan Roberts
2023-12-04 15:47   ` David Hildenbrand
2023-12-04 16:00     ` David Hildenbrand
2023-12-04 17:27       ` David Hildenbrand
2023-12-05 11:30         ` Ryan Roberts
2023-12-05 12:04           ` David Hildenbrand
2023-12-05 14:16             ` Ryan Roberts
2023-12-08  0:32   ` Alistair Popple
2023-12-12 11:51     ` Ryan Roberts
2023-12-04 10:54 ` [PATCH v3 02/15] mm: Batch-clear PTE ranges during zap_pte_range() Ryan Roberts
2023-12-08  1:30   ` Alistair Popple
2023-12-12 11:57     ` Ryan Roberts
2023-12-04 10:54 ` [PATCH v3 03/15] arm64/mm: set_pte(): New layer to manage contig bit Ryan Roberts
2023-12-04 10:54 ` [PATCH v3 04/15] arm64/mm: set_ptes()/set_pte_at(): " Ryan Roberts
2023-12-04 10:54 ` [PATCH v3 05/15] arm64/mm: pte_clear(): " Ryan Roberts
2023-12-04 10:54 ` [PATCH v3 06/15] arm64/mm: ptep_get_and_clear(): " Ryan Roberts
2023-12-04 10:54 ` [PATCH v3 07/15] arm64/mm: ptep_test_and_clear_young(): " Ryan Roberts
2023-12-04 10:54 ` [PATCH v3 08/15] arm64/mm: ptep_clear_flush_young(): " Ryan Roberts
2023-12-04 10:54 ` [PATCH v3 09/15] arm64/mm: ptep_set_wrprotect(): " Ryan Roberts
2023-12-04 10:54 ` [PATCH v3 10/15] arm64/mm: ptep_set_access_flags(): " Ryan Roberts
2023-12-04 10:54 ` [PATCH v3 11/15] arm64/mm: ptep_get(): " Ryan Roberts
2023-12-04 10:54 ` [PATCH v3 12/15] arm64/mm: Split __flush_tlb_range() to elide trailing DSB Ryan Roberts
2023-12-12 11:35   ` Will Deacon
2023-12-12 11:47     ` Ryan Roberts
2023-12-14 11:53       ` Ryan Roberts
2023-12-14 12:13         ` Will Deacon
2023-12-14 12:30           ` Robin Murphy
2023-12-14 14:28             ` Ryan Roberts
2023-12-14 15:22             ` Jean-Philippe Brucker
2023-12-14 16:45               ` Jonathan Cameron
2023-12-04 10:54 ` [PATCH v3 13/15] arm64/mm: Wire up PTE_CONT for user mappings Ryan Roberts
2023-12-04 10:54 ` [PATCH v3 14/15] arm64/mm: Implement ptep_set_wrprotects() to optimize fork() Ryan Roberts
2023-12-08  1:37   ` Alistair Popple
2023-12-12 11:59     ` Ryan Roberts
2023-12-15  4:32       ` Alistair Popple
2023-12-15 14:05         ` Ryan Roberts
2023-12-04 10:54 ` [PATCH v3 15/15] arm64/mm: Implement clear_ptes() to optimize exit() Ryan Roberts
2023-12-08  1:45   ` Alistair Popple
2023-12-12 12:02     ` Ryan Roberts
2023-12-05  3:41 ` John Hubbard [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ea345a14-0a39-425c-a2df-d163ca948f57@nvidia.com \
    --to=jhubbard@nvidia.com \
    --cc=21cnbao@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=andreyknvl@gmail.com \
    --cc=anshuman.khandual@arm.com \
    --cc=apopple@nvidia.com \
    --cc=ardb@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=david@redhat.com \
    --cc=dvyukov@google.com \
    --cc=glider@google.com \
    --cc=james.morse@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mark.rutland@arm.com \
    --cc=maz@kernel.org \
    --cc=oliver.upton@linux.dev \
    --cc=ryabinin.a.a@gmail.com \
    --cc=ryan.roberts@arm.com \
    --cc=shy828301@gmail.com \
    --cc=suzuki.poulose@arm.com \
    --cc=vincenzo.frascino@arm.com \
    --cc=wangkefeng.wang@huawei.com \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=yuzenghui@huawei.com \
    --cc=yuzhao@google.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox