From: Ryan Roberts <ryan.roberts@arm.com>
To: Andrew Morton <akpm@linux-foundation.org>,
Matthew Wilcox <willy@infradead.org>,
Yin Fengwei <fengwei.yin@intel.com>,
David Hildenbrand <david@redhat.com>, Yu Zhao <yuzhao@google.com>,
Catalin Marinas <catalin.marinas@arm.com>,
Anshuman Khandual <anshuman.khandual@arm.com>,
Yang Shi <shy828301@gmail.com>,
"Huang, Ying" <ying.huang@intel.com>, Zi Yan <ziy@nvidia.com>,
Luis Chamberlain <mcgrof@kernel.org>,
Itaru Kitayama <itaru.kitayama@gmail.com>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH v5 0/5] variable-order, large folios for anonymous memory
Date: Thu, 10 Aug 2023 16:13:39 +0100 [thread overview]
Message-ID: <fae40dbd-4d80-4058-83b3-c85c35093c87@arm.com> (raw)
In-Reply-To: <20230810142942.3169679-1-ryan.roberts@arm.com>
On 10/08/2023 15:29, Ryan Roberts wrote:
> Hi All,
>
> This is v5 of a series to implement variable order, large folios for anonymous
> memory. (currently called "LARGE_ANON_FOLIO", previously called "FLEXIBLE_THP").
> The objective of this is to improve performance by allocating larger chunks of
> memory during anonymous page faults:
>
> 1) Since SW (the kernel) is dealing with larger chunks of memory than base
> pages, there are efficiency savings to be had; fewer page faults, batched PTE
> and RMAP manipulation, reduced lru list, etc. In short, we reduce kernel
> overhead. This should benefit all architectures.
> 2) Since we are now mapping physically contiguous chunks of memory, we can take
> advantage of HW TLB compression techniques. A reduction in TLB pressure
> speeds up kernel and user space. arm64 systems have 2 mechanisms to coalesce
> TLB entries; "the contiguous bit" (architectural) and HPA (uarch).
>
> This patch set deals with the SW side of things (1). (2) is being tackled in a
> separate series. The new behaviour is hidden behind a new Kconfig switch,
> LARGE_ANON_FOLIO, which is disabled by default. Although the eventual aim is to
> enable it by default.
>
> My hope is that we are pretty much there with the changes at this point;
> hopefully this is sufficient to get an initial version merged so that we can
> scale up characterization efforts. Although they should not be merged until the
> prerequisites are complete. These are in progress and tracked at [5].
>
> This series is based on mm-unstable (ad3232df3e41).
>
> I'm going to be out on holiday from the end of today, returning on 29th
> August. So responses will likely be patchy, as I'm terrified of posting
> to list from my phone!
>
>
> Testing
> -------
>
> This version adds patches to mm selftests so that the cow tests explicitly test
> large anon folios, in the same way that thp is tested. When enabled you should
> see something similar at the start of the test suite:
>
> # [INFO] detected large anon folio size: 32 KiB
>
> Then the following results are expected. The fails and skips are due to existing
> issues in mm-unstable:
>
> # Totals: pass:207 fail:16 xfail:0 xpass:0 skip:85 error:0
Oops, the above are the results when running with SWAP disabled. This is what
you would normally see when SWAP is enabled:
# Totals: pass:291 fail:16 xfail:0 xpass:0 skip:1 error:0
>
> Existing mm selftests reveal 1 regression in khugepaged tests when
> LARGE_ANON_FOLIO is enabled:
>
> Run test: collapse_max_ptes_none (khugepaged:anon)
> Maybe collapse with max_ptes_none exceeded.... Fail
> Unexpected huge page
>
> I believe this is because khugepaged currently skips non-order-0 pages when
> looking for collapse opportunities and should get fixed with the help of
> DavidH's work to create a mechanism to precisely determine shared vs exclusive
> pages.
>
>
> Changes since v4 [4]
> --------------------
>
> - Removed "arm64: mm: Override arch_wants_pte_order()" patch; arm64
> now uses the default order-3 size. I have moved this patch over to
> the contpte series.
> - Added "mm: Allow deferred splitting of arbitrary large anon folios" back
> into series. I originally removed this at v2 to add to a separate series,
> but that series has transformed significantly and it no longer fits, so
> bringing it back here.
> - Reintroduced dependency on set_ptes(); Originally dropped this at v2, but
> set_ptes() is in mm-unstable now.
> - Updated policy for when to allocate LAF; only fallback to order-0 if
> MADV_NOHUGEPAGE is present or if THP disabled via prctl; no longer rely on
> sysfs's never/madvise/always knob.
> - Fallback to order-0 whenever uffd is armed for the vma, not just when
> uffd-wp is set on the pte.
> - alloc_anon_folio() now returns `strucxt folio *`, where errors are encoded
> with ERR_PTR().
>
> The last 3 changes were proposed by Yu Zhao - thanks!
>
>
> Changes since v3 [3]
> --------------------
>
> - Renamed feature from FLEXIBLE_THP to LARGE_ANON_FOLIO.
> - Removed `flexthp_unhinted_max` boot parameter. Discussion concluded that a
> sysctl is preferable but we will wait until real workload needs it.
> - Fixed uninitialized `addr` on read fault path in do_anonymous_page().
> - Added mm selftests for large anon folios in cow test suite.
>
>
> Changes since v2 [2]
> --------------------
>
> - Dropped commit "Allow deferred splitting of arbitrary large anon folios"
> - Huang, Ying suggested the "batch zap" work (which I dropped from this
> series after v1) is a prerequisite for merging FLXEIBLE_THP, so I've
> moved the deferred split patch to a separate series along with the batch
> zap changes. I plan to submit this series early next week.
> - Changed folio order fallback policy
> - We no longer iterate from preferred to 0 looking for acceptable policy
> - Instead we iterate through preferred, PAGE_ALLOC_COSTLY_ORDER and 0 only
> - Removed vma parameter from arch_wants_pte_order()
> - Added command line parameter `flexthp_unhinted_max`
> - clamps preferred order when vma hasn't explicitly opted-in to THP
> - Never allocate large folio for MADV_NOHUGEPAGE vma (or when THP is disabled
> for process or system).
> - Simplified implementation and integration with do_anonymous_page()
> - Removed dependency on set_ptes()
>
>
> Changes since v1 [1]
> --------------------
>
> - removed changes to arch-dependent vma_alloc_zeroed_movable_folio()
> - replaced with arch-independent alloc_anon_folio()
> - follows THP allocation approach
> - no longer retry with intermediate orders if allocation fails
> - fallback directly to order-0
> - remove folio_add_new_anon_rmap_range() patch
> - instead add its new functionality to folio_add_new_anon_rmap()
> - remove batch-zap pte mappings optimization patch
> - remove enabler folio_remove_rmap_range() patch too
> - These offer real perf improvement so will submit separately
> - simplify Kconfig
> - single FLEXIBLE_THP option, which is independent of arch
> - depends on TRANSPARENT_HUGEPAGE
> - when enabled default to max anon folio size of 64K unless arch
> explicitly overrides
> - simplify changes to do_anonymous_page():
> - no more retry loop
>
>
> [1] https://lore.kernel.org/linux-mm/20230626171430.3167004-1-ryan.roberts@arm.com/
> [2] https://lore.kernel.org/linux-mm/20230703135330.1865927-1-ryan.roberts@arm.com/
> [3] https://lore.kernel.org/linux-mm/20230714160407.4142030-1-ryan.roberts@arm.com/
> [4] https://lore.kernel.org/linux-mm/20230726095146.2826796-1-ryan.roberts@arm.com/
> [5] https://lore.kernel.org/linux-mm/f8d47176-03a8-99bf-a813-b5942830fd73@arm.com/
>
>
> Thanks,
> Ryan
>
> Ryan Roberts (5):
> mm: Allow deferred splitting of arbitrary large anon folios
> mm: Non-pmd-mappable, large folios for folio_add_new_anon_rmap()
> mm: LARGE_ANON_FOLIO for improved performance
> selftests/mm/cow: Generalize do_run_with_thp() helper
> selftests/mm/cow: Add large anon folio tests
>
> include/linux/pgtable.h | 13 ++
> mm/Kconfig | 10 ++
> mm/memory.c | 144 +++++++++++++++++--
> mm/rmap.c | 31 +++--
> tools/testing/selftests/mm/cow.c | 229 ++++++++++++++++++++++---------
> 5 files changed, 347 insertions(+), 80 deletions(-)
>
> --
> 2.25.1
>
next prev parent reply other threads:[~2023-08-10 15:13 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-10 14:29 Ryan Roberts
2023-08-10 14:29 ` [PATCH v5 1/5] mm: Allow deferred splitting of arbitrary large anon folios Ryan Roberts
2023-08-10 14:29 ` [PATCH v5 2/5] mm: Non-pmd-mappable, large folios for folio_add_new_anon_rmap() Ryan Roberts
2023-08-10 14:29 ` [PATCH v5 3/5] mm: LARGE_ANON_FOLIO for improved performance Ryan Roberts
2023-08-10 17:01 ` Yu Zhao
2023-08-10 19:12 ` Ryan Roberts
2023-08-10 19:46 ` Zi Yan
2023-08-11 0:36 ` Yin, Fengwei
2023-08-11 1:04 ` Zi Yan
2023-08-11 5:34 ` Yin, Fengwei
2023-08-11 14:33 ` Zi Yan
2023-08-12 0:23 ` Yin, Fengwei
2023-08-30 11:41 ` Ryan Roberts
2023-08-31 0:14 ` Yin, Fengwei
2023-08-11 0:27 ` Yin, Fengwei
2023-08-15 21:32 ` Huang, Ying
2023-08-30 12:07 ` Ryan Roberts
2023-08-31 1:40 ` Huang, Ying
2023-08-31 7:57 ` David Hildenbrand
2023-08-31 8:02 ` Yin, Fengwei
2023-08-31 8:09 ` David Hildenbrand
2023-08-31 12:29 ` Matthew Wilcox
2023-09-01 14:40 ` David Hildenbrand
2023-08-31 17:15 ` Yang Shi
2023-09-01 16:13 ` Matthew Wilcox
2023-09-01 17:18 ` Yang Shi
2023-09-04 10:05 ` Ryan Roberts
2023-08-10 14:29 ` [PATCH v5 4/5] selftests/mm/cow: Generalize do_run_with_thp() helper Ryan Roberts
2023-08-10 14:29 ` [PATCH v5 5/5] selftests/mm/cow: Add large anon folio tests Ryan Roberts
2023-08-10 15:13 ` Ryan Roberts [this message]
2023-08-16 8:11 ` [PATCH v5 0/5] variable-order, large folios for anonymous memory Itaru Kitayama
2023-08-16 9:25 ` Yin, Fengwei
2023-08-16 11:57 ` Itaru Kitayama
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=fae40dbd-4d80-4058-83b3-c85c35093c87@arm.com \
--to=ryan.roberts@arm.com \
--cc=akpm@linux-foundation.org \
--cc=anshuman.khandual@arm.com \
--cc=catalin.marinas@arm.com \
--cc=david@redhat.com \
--cc=fengwei.yin@intel.com \
--cc=itaru.kitayama@gmail.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mcgrof@kernel.org \
--cc=shy828301@gmail.com \
--cc=willy@infradead.org \
--cc=ying.huang@intel.com \
--cc=yuzhao@google.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox