From: Barry Song <21cnbao@gmail.com>
To: Ryan Roberts <ryan.roberts@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>,
Will Deacon <will@kernel.org>, Ard Biesheuvel <ardb@kernel.org>,
Marc Zyngier <maz@kernel.org>,
Oliver Upton <oliver.upton@linux.dev>,
James Morse <james.morse@arm.com>,
Suzuki K Poulose <suzuki.poulose@arm.com>,
Zenghui Yu <yuzenghui@huawei.com>,
Andrey Ryabinin <ryabinin.a.a@gmail.com>,
Alexander Potapenko <glider@google.com>,
Andrey Konovalov <andreyknvl@gmail.com>,
Dmitry Vyukov <dvyukov@google.com>,
Vincenzo Frascino <vincenzo.frascino@arm.com>,
Andrew Morton <akpm@linux-foundation.org>,
Anshuman Khandual <anshuman.khandual@arm.com>,
Matthew Wilcox <willy@infradead.org>,
Yu Zhao <yuzhao@google.com>, Mark Rutland <mark.rutland@arm.com>,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH v1 00/14] Transparent Contiguous PTEs for User Mappings
Date: Mon, 10 Jul 2023 20:05:19 +0800 [thread overview]
Message-ID: <CAGsJ_4x4b5Qe2RNTHyR2MQqSkRxcuchrUgrap9WTaDtuMUttcA@mail.gmail.com> (raw)
In-Reply-To: <20230622144210.2623299-1-ryan.roberts@arm.com>
On Thu, Jun 22, 2023 at 11:00 PM Ryan Roberts <ryan.roberts@arm.com> wrote:
>
> Hi All,
>
> This is a series to opportunistically and transparently use contpte mappings
> (set the contiguous bit in ptes) for user memory when those mappings meet the
> requirements. It is part of a wider effort to improve performance of the 4K
> kernel with the aim of approaching the performance of the 16K kernel, but
> without breaking compatibility and without the associated increase in memory. It
> also benefits the 16K and 64K kernels by enabling 2M THP, since this is the
> contpte size for those kernels.
>
> Of course this is only one half of the change. We require the mapped physical
> memory to be the correct size and alignment for this to actually be useful (i.e.
> 64K for 4K pages, or 2M for 16K/64K pages). Fortunately folios are solving this
> problem for us. Filesystems that support it (XFS, AFS, EROFS, tmpfs) will
> allocate large folios up to the PMD size today, and more filesystems are coming.
> And the other half of my work, to enable the use of large folios for anonymous
> memory, aims to make contpte sized folios prevalent for anonymous memory too.
>
>
> Dependencies
> ------------
>
> While there is a complicated set of hard and soft dependencies that this patch
> set depends on, I wanted to split it out as best I could and kick off proper
> review independently.
>
> The series applies on top of these other patch sets, with a tree at:
> https://gitlab.arm.com/linux-arm/linux-rr/-/tree/features/granule_perf/contpte-lkml_v1
>
> v6.4-rc6
> - base
>
> set_ptes()
> - hard dependency
> - Patch set from Matthew Wilcox to set multiple ptes with a single API call
> - Allows arch backend to more optimally apply contpte mappings
> - https://lore.kernel.org/linux-mm/20230315051444.3229621-1-willy@infradead.org/
>
> ptep_get() pte encapsulation
> - hard dependency
> - Enabler series from me to ensure none of the core code ever directly
> dereferences a pte_t that lies within a live page table.
> - Enables gathering access/dirty bits from across the whole contpte range
> - in mm-stable and linux-next at time of writing
> - https://lore.kernel.org/linux-mm/d38dc237-6093-d4c5-993e-e8ffdd6cb6fa@arm.com/
>
> Report on physically contiguous memory in smaps
> - soft dependency
> - Enables visibility on how much memory is physically contiguous and how much
> is contpte-mapped - useful for debug
> - https://lore.kernel.org/linux-mm/20230613160950.3554675-1-ryan.roberts@arm.com/
>
> Additionally there are a couple of other dependencies:
>
> anonfolio
> - soft dependency
> - ensures more anonymous memory is allocated in contpte-sized folios, so
> needed to realize the performance improvements (this is the "other half"
> mentioned above).
> - RFC: https://lore.kernel.org/linux-mm/20230414130303.2345383-1-ryan.roberts@arm.com/
> - Intending to post v1 shortly.
>
> exefolio
> - soft dependency
> - Tweak readahead to ensure executable memory are in 64K-sized folios, so
> needed to see reduction in iTLB pressure.
> - Don't intend to post this until we are further down the track with contpte
> and anonfolio.
>
> Arm ARM Clarification
> - hard dependency
> - Current wording disallows the fork() optimization in the final patch.
> - Arm (ATG) have proposed tightening the wording to permit it.
> - In conversation with partners to check this wouldn't cause problems for any
> existing HW deployments
>
> All of the _hard_ dependencies need to be resolved before this can be considered
> for merging.
>
>
> Performance
> -----------
>
> Below results show 2 benchmarks; kernel compilation and speedometer 2.0 (a
> javascript benchmark running in Chromium). Both cases are running on Ampere
> Altra with 1 NUMA node enabled, Ubuntu 22.04 and XFS filesystem. Each benchmark
> is repeated 15 times over 5 reboots and averaged.
>
> All improvements are relative to baseline-4k. anonfolio and exefolio are as
> described above. contpte is this series. (Note that exefolio only gives an
> improvement because contpte is already in place).
>
> Kernel Compilation (smaller is better):
>
> | kernel | real-time | kern-time | user-time |
> |:-------------|------------:|------------:|------------:|
> | baseline-4k | 0.0% | 0.0% | 0.0% |
> | anonfolio | -5.4% | -46.0% | -0.3% |
> | contpte | -6.8% | -45.7% | -2.1% |
> | exefolio | -8.4% | -46.4% | -3.7% |
sorry i am a bit confused. in exefolio case, is anonfolio included?
or it only has large cont-pte folios on exe code? in the other words,
Does the 8.4% improvement come from iTLB miss reduction only,
or from both dTLB and iTLB miss reduction?
> | baseline-16k | -8.7% | -49.2% | -3.7% |
> | baseline-64k | -10.5% | -66.0% | -3.5% |
>
> Speedometer 2.0 (bigger is better):
>
> | kernel | runs_per_min |
> |:-------------|---------------:|
> | baseline-4k | 0.0% |
> | anonfolio | 1.2% |
> | contpte | 3.1% |
> | exefolio | 4.2% |
same question as above.
> | baseline-16k | 5.3% |
>
> I've also run Speedometer 2.0 on Pixel 6 with an Ubuntu SW stack and see similar
> gains.
>
> I've also verified that running the contpte changes without anonfolio and
> exefolio does not cause any regression vs baseline-4k.
>
>
> Opens
> -----
>
> The only potential issue that I see right now is that due to there only being 1
> access/dirty bit per contpte range, if a single page in the range is
> accessed/dirtied then all the adjacent pages are reported as accessed/dirtied
> too. Access/dirty is managed by the kernel per _folio_, so this information gets
> collapsed down anyway, and nothing changes there. However, the per _page_
> access/dirty information is reported through pagemap to user space. I'm not sure
> if this would/should be considered a break? Thoughts?
>
> Thanks,
> Ryan
Thanks
Barry
next prev parent reply other threads:[~2023-07-10 12:05 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-22 14:41 Ryan Roberts
2023-06-22 14:41 ` [PATCH v1 01/14] arm64/mm: set_pte(): New layer to manage contig bit Ryan Roberts
2023-06-22 14:41 ` [PATCH v1 02/14] arm64/mm: set_ptes()/set_pte_at(): " Ryan Roberts
2023-06-22 14:41 ` [PATCH v1 03/14] arm64/mm: pte_clear(): " Ryan Roberts
2023-06-22 14:41 ` [PATCH v1 04/14] arm64/mm: ptep_get_and_clear(): " Ryan Roberts
2023-06-22 14:42 ` [PATCH v1 05/14] arm64/mm: ptep_test_and_clear_young(): " Ryan Roberts
2023-06-22 14:42 ` [PATCH v1 06/14] arm64/mm: ptep_clear_flush_young(): " Ryan Roberts
2023-06-22 14:42 ` [PATCH v1 07/14] arm64/mm: ptep_set_wrprotect(): " Ryan Roberts
2023-06-22 14:42 ` [PATCH v1 08/14] arm64/mm: ptep_set_access_flags(): " Ryan Roberts
2023-06-22 14:42 ` [PATCH v1 09/14] arm64/mm: ptep_get(): " Ryan Roberts
2023-06-22 14:42 ` [PATCH v1 10/14] arm64/mm: Split __flush_tlb_range() to elide trailing DSB Ryan Roberts
2023-06-22 14:42 ` [PATCH v1 11/14] arm64/mm: Wire up PTE_CONT for user mappings Ryan Roberts
2023-06-30 1:54 ` John Hubbard
2023-07-03 9:48 ` Ryan Roberts
2023-07-03 15:17 ` Catalin Marinas
2023-07-04 11:09 ` Ryan Roberts
2023-07-05 13:13 ` Ryan Roberts
2023-07-16 15:09 ` Catalin Marinas
2023-06-22 14:42 ` [PATCH v1 12/14] arm64/mm: Add ptep_get_and_clear_full() to optimize process teardown Ryan Roberts
2023-06-22 14:42 ` [PATCH v1 13/14] mm: Batch-copy PTE ranges during fork() Ryan Roberts
2023-06-22 14:42 ` [PATCH v1 14/14] arm64/mm: Implement ptep_set_wrprotects() to optimize fork() Ryan Roberts
2023-07-10 12:05 ` Barry Song [this message]
2023-07-10 13:28 ` [PATCH v1 00/14] Transparent Contiguous PTEs for User Mappings Ryan Roberts
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAGsJ_4x4b5Qe2RNTHyR2MQqSkRxcuchrUgrap9WTaDtuMUttcA@mail.gmail.com \
--to=21cnbao@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=andreyknvl@gmail.com \
--cc=anshuman.khandual@arm.com \
--cc=ardb@kernel.org \
--cc=catalin.marinas@arm.com \
--cc=dvyukov@google.com \
--cc=glider@google.com \
--cc=james.morse@arm.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mark.rutland@arm.com \
--cc=maz@kernel.org \
--cc=oliver.upton@linux.dev \
--cc=ryabinin.a.a@gmail.com \
--cc=ryan.roberts@arm.com \
--cc=suzuki.poulose@arm.com \
--cc=vincenzo.frascino@arm.com \
--cc=will@kernel.org \
--cc=willy@infradead.org \
--cc=yuzenghui@huawei.com \
--cc=yuzhao@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox