From: Yu Zhao <yuzhao@google.com>
To: David Hildenbrand <david@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>,
Andrew Morton <akpm@linux-foundation.org>,
Matthew Wilcox <willy@infradead.org>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
Yin Fengwei <fengwei.yin@intel.com>,
Catalin Marinas <catalin.marinas@arm.com>,
Will Deacon <will@kernel.org>,
Anshuman Khandual <anshuman.khandual@arm.com>,
Yang Shi <shy828301@gmail.com>,
"Huang, Ying" <ying.huang@intel.com>, Zi Yan <ziy@nvidia.com>,
Luis Chamberlain <mcgrof@kernel.org>,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH v3 3/4] mm: FLEXIBLE_THP for improved performance
Date: Mon, 17 Jul 2023 11:07:57 -0600 [thread overview]
Message-ID: <CAOUHufYnVdxoKgvxFmk7e0KqtOV9=zWQ-vjVX7JOLNM-cRKR9Q@mail.gmail.com> (raw)
In-Reply-To: <82c934af-a777-3437-8d87-ff453ad94bfd@redhat.com>
On Mon, Jul 17, 2023 at 7:06 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 14.07.23 19:17, Yu Zhao wrote:
> > On Fri, Jul 14, 2023 at 10:17 AM Ryan Roberts <ryan.roberts@arm.com> wrote:
> >>
> >> Introduce FLEXIBLE_THP feature, which allows anonymous memory to be
> >> allocated in large folios of a determined order. All pages of the large
> >> folio are pte-mapped during the same page fault, significantly reducing
> >> the number of page faults. The number of per-page operations (e.g. ref
> >> counting, rmap management lru list management) are also significantly
> >> reduced since those ops now become per-folio.
> >>
> >> The new behaviour is hidden behind the new FLEXIBLE_THP Kconfig, which
> >> defaults to disabled for now; The long term aim is for this to defaut to
> >> enabled, but there are some risks around internal fragmentation that
> >> need to be better understood first.
> >>
> >> When enabled, the folio order is determined as such: For a vma, process
> >> or system that has explicitly disabled THP, we continue to allocate
> >> order-0. THP is most likely disabled to avoid any possible internal
> >> fragmentation so we honour that request.
> >>
> >> Otherwise, the return value of arch_wants_pte_order() is used. For vmas
> >> that have not explicitly opted-in to use transparent hugepages (e.g.
> >> where thp=madvise and the vma does not have MADV_HUGEPAGE), then
> >> arch_wants_pte_order() is limited by the new cmdline parameter,
> >> `flexthp_unhinted_max`. This allows for a performance boost without
> >> requiring any explicit opt-in from the workload while allowing the
> >> sysadmin to tune between performance and internal fragmentation.
> >>
> >> arch_wants_pte_order() can be overridden by the architecture if desired.
> >> Some architectures (e.g. arm64) can coalsece TLB entries if a contiguous
> >> set of ptes map physically contigious, naturally aligned memory, so this
> >> mechanism allows the architecture to optimize as required.
> >>
> >> If the preferred order can't be used (e.g. because the folio would
> >> breach the bounds of the vma, or because ptes in the region are already
> >> mapped) then we fall back to a suitable lower order; first
> >> PAGE_ALLOC_COSTLY_ORDER, then order-0.
> >>
> >> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> >> ---
> >> .../admin-guide/kernel-parameters.txt | 10 +
> >> mm/Kconfig | 10 +
> >> mm/memory.c | 187 ++++++++++++++++--
> >> 3 files changed, 190 insertions(+), 17 deletions(-)
> >>
> >> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> >> index a1457995fd41..405d624e2191 100644
> >> --- a/Documentation/admin-guide/kernel-parameters.txt
> >> +++ b/Documentation/admin-guide/kernel-parameters.txt
> >> @@ -1497,6 +1497,16 @@
> >> See Documentation/admin-guide/sysctl/net.rst for
> >> fb_tunnels_only_for_init_ns
> >>
> >> + flexthp_unhinted_max=
> >> + [KNL] Requires CONFIG_FLEXIBLE_THP enabled. The maximum
> >> + folio size that will be allocated for an anonymous vma
> >> + that has neither explicitly opted in nor out of using
> >> + transparent hugepages. The size must be a power-of-2 in
> >> + the range [PAGE_SIZE, PMD_SIZE). A larger size improves
> >> + performance by reducing page faults, while a smaller
> >> + size reduces internal fragmentation. Default: max(64K,
> >> + PAGE_SIZE). Format: size[KMG].
> >> +
> >
> > Let's split this parameter into a separate patch.
> >
>
> Just a general comment after stumbling over patch #2, let's not start
> splitting patches into things that don't make any sense on their own;
> that just makes review a lot harder.
Sorry to hear this -- but there are also non-subjective reasons we
split patches this way.
Initially we had minimum to no common ground, so we had to divide and
conquer by smallest steps.
if you look at previous discussions: there was a disagreement on patch
2 in v2 -- that's the patch you asked to be squashed into the main
patch 3. Fortunately we've resolved that. If that disagreement had
persisted, we would leave patch 2 out rather than let it bog down
patch 3, which would work indifferently for all arches except arm and
could be merged separately.
> For this case here, I'd suggest first adding the general infrastructure
> and then adding tunables we want to have on top.
>
> I agree that toggling that at runtime (for example via sysfs as raised
> by me previously) would be nicer.
next prev parent reply other threads:[~2023-07-17 17:08 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-14 16:04 [PATCH v3 0/4] variable-order, large folios for anonymous memory Ryan Roberts
2023-07-14 16:17 ` [PATCH v3 1/4] mm: Non-pmd-mappable, large folios for folio_add_new_anon_rmap() Ryan Roberts
2023-07-14 16:52 ` Yu Zhao
2023-07-14 18:01 ` Ryan Roberts
2023-07-17 13:00 ` David Hildenbrand
2023-07-17 13:13 ` Ryan Roberts
2023-07-17 13:19 ` David Hildenbrand
2023-07-17 13:21 ` Ryan Roberts
2023-07-14 16:17 ` [PATCH v3 2/4] mm: Default implementation of arch_wants_pte_order() Ryan Roberts
2023-07-14 16:54 ` Yu Zhao
2023-07-17 11:13 ` Yin Fengwei
2023-07-17 13:01 ` David Hildenbrand
2023-07-17 13:15 ` Ryan Roberts
2023-07-14 16:17 ` [PATCH v3 3/4] mm: FLEXIBLE_THP for improved performance Ryan Roberts
2023-07-14 17:17 ` Yu Zhao
2023-07-14 17:59 ` Ryan Roberts
2023-07-14 22:11 ` Yu Zhao
2023-07-17 13:36 ` Ryan Roberts
2023-07-17 19:31 ` Yu Zhao
2023-07-17 20:35 ` Yu Zhao
2023-07-17 23:37 ` Hugh Dickins
2023-07-18 10:36 ` Ryan Roberts
2023-07-17 13:06 ` David Hildenbrand
2023-07-17 13:20 ` Ryan Roberts
2023-07-17 13:56 ` David Hildenbrand
2023-07-17 14:47 ` Ryan Roberts
2023-07-17 14:55 ` David Hildenbrand
2023-07-17 17:07 ` Yu Zhao [this message]
2023-07-17 17:16 ` David Hildenbrand
2023-07-21 10:57 ` Ryan Roberts
2023-07-14 16:17 ` [PATCH v3 4/4] arm64: mm: Override arch_wants_pte_order() Ryan Roberts
2023-07-14 16:47 ` Yu Zhao
2023-07-24 11:59 ` [PATCH v3 0/4] variable-order, large folios for anonymous memory Ryan Roberts
2023-07-24 14:58 ` Zi Yan
2023-07-24 15:41 ` Ryan Roberts
2023-07-26 7:36 ` Itaru Kitayama
2023-07-26 8:42 ` Ryan Roberts
2023-07-26 8:47 ` Itaru Kitayama
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAOUHufYnVdxoKgvxFmk7e0KqtOV9=zWQ-vjVX7JOLNM-cRKR9Q@mail.gmail.com' \
--to=yuzhao@google.com \
--cc=akpm@linux-foundation.org \
--cc=anshuman.khandual@arm.com \
--cc=catalin.marinas@arm.com \
--cc=david@redhat.com \
--cc=fengwei.yin@intel.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mcgrof@kernel.org \
--cc=ryan.roberts@arm.com \
--cc=shy828301@gmail.com \
--cc=will@kernel.org \
--cc=willy@infradead.org \
--cc=ying.huang@intel.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox