From: Ryan Roberts <ryan.roberts@arm.com>
To: "Huang, Ying" <ying.huang@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Matthew Wilcox <willy@infradead.org>,
Yin Fengwei <fengwei.yin@intel.com>,
David Hildenbrand <david@redhat.com>, Yu Zhao <yuzhao@google.com>,
Catalin Marinas <catalin.marinas@arm.com>,
Anshuman Khandual <anshuman.khandual@arm.com>,
Yang Shi <shy828301@gmail.com>, Zi Yan <ziy@nvidia.com>,
Luis Chamberlain <mcgrof@kernel.org>,
Itaru Kitayama <itaru.kitayama@gmail.com>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH v5 3/5] mm: LARGE_ANON_FOLIO for improved performance
Date: Wed, 30 Aug 2023 13:07:01 +0100 [thread overview]
Message-ID: <5c9ba378-2920-4892-bdf0-174e47d528b7@arm.com> (raw)
In-Reply-To: <87v8dg6lfu.fsf@yhuang6-desk2.ccr.corp.intel.com>
On 15/08/2023 22:32, Huang, Ying wrote:
> Hi, Ryan,
>
> Ryan Roberts <ryan.roberts@arm.com> writes:
>
>> Introduce LARGE_ANON_FOLIO feature, which allows anonymous memory to be
>> allocated in large folios of a determined order. All pages of the large
>> folio are pte-mapped during the same page fault, significantly reducing
>> the number of page faults. The number of per-page operations (e.g. ref
>> counting, rmap management lru list management) are also significantly
>> reduced since those ops now become per-folio.
>>
>> The new behaviour is hidden behind the new LARGE_ANON_FOLIO Kconfig,
>> which defaults to disabled for now; The long term aim is for this to
>> defaut to enabled, but there are some risks around internal
>> fragmentation that need to be better understood first.
>>
>> Large anonymous folio (LAF) allocation is integrated with the existing
>> (PMD-order) THP and single (S) page allocation according to this policy,
>> where fallback (>) is performed for various reasons, such as the
>> proposed folio order not fitting within the bounds of the VMA, etc:
>>
>> | prctl=dis | prctl=ena | prctl=ena | prctl=ena
>> | sysfs=X | sysfs=never | sysfs=madvise | sysfs=always
>> ----------------|-----------|-------------|---------------|-------------
>> no hint | S | LAF>S | LAF>S | THP>LAF>S
>> MADV_HUGEPAGE | S | LAF>S | THP>LAF>S | THP>LAF>S
>> MADV_NOHUGEPAGE | S | S | S | S
>
> IMHO, we should use the following semantics as you have suggested
> before.
>
> | prctl=dis | prctl=ena | prctl=ena | prctl=ena
> | sysfs=X | sysfs=never | sysfs=madvise | sysfs=always
> ----------------|-----------|-------------|---------------|-------------
> no hint | S | S | LAF>S | THP>LAF>S
> MADV_HUGEPAGE | S | S | THP>LAF>S | THP>LAF>S
> MADV_NOHUGEPAGE | S | S | S | S
>
> Or even,
>
> | prctl=dis | prctl=ena | prctl=ena | prctl=ena
> | sysfs=X | sysfs=never | sysfs=madvise | sysfs=always
> ----------------|-----------|-------------|---------------|-------------
> no hint | S | S | S | THP>LAF>S
> MADV_HUGEPAGE | S | S | THP>LAF>S | THP>LAF>S
> MADV_NOHUGEPAGE | S | S | S | S
>
> From the implementation point of view, PTE mapped PMD-sized THP has
> almost no difference with LAF (just some small sized THP). It will be
> confusing to distinguish them from the interface point of view.
>
> So, IMHO, the real difference is the policy. For example, prefer
> PMD-sized THP, prefer small sized THP, or fully auto. The sysfs
> interface is used to specify system global policy. In the long term, it
> can be something like below,
>
> never: S # disable all THP
> madvise: # never by default, control via madvise()
> always: THP>LAF>S # prefer PMD-sized THP in fact
> small: LAF>S # prefer small sized THP
> auto: # use in-kernel heuristics for THP size
>
> But it may be not ready to add new policies now. So, before the new
> policies are ready, we can add a debugfs interface to override the
> original policy in /sys/kernel/mm/transparent_hugepage/enabled. After
> we have tuned enough workloads, collected enough data, we can add new
> policies to the sysfs interface.
I think we can all imagine many policy options. But we don't really have much
evidence yet for what it best. The policy I'm currently using is intended to
give some flexibility for testing (use LAF without THP by setting sysfs=never,
use THP without LAF by compiling without LAF) without adding any new knobs at
all. Given that, surely we can defer these decisions until we have more data?
In the absence of data, your proposed solution sounds very sensible to me. But
for the purposes of scaling up perf testing, I don't think its essential given
the current policy will also produce the same options.
If we were going to add a debugfs knob, I think the higher priority would be a
knob to specify the folio order. (but again, I would rather avoid if possible).
Thanks,
Ryan
next prev parent reply other threads:[~2023-08-30 12:08 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-10 14:29 [PATCH v5 0/5] variable-order, large folios for anonymous memory Ryan Roberts
2023-08-10 14:29 ` [PATCH v5 1/5] mm: Allow deferred splitting of arbitrary large anon folios Ryan Roberts
2023-08-10 14:29 ` [PATCH v5 2/5] mm: Non-pmd-mappable, large folios for folio_add_new_anon_rmap() Ryan Roberts
2023-08-10 14:29 ` [PATCH v5 3/5] mm: LARGE_ANON_FOLIO for improved performance Ryan Roberts
2023-08-10 17:01 ` Yu Zhao
2023-08-10 19:12 ` Ryan Roberts
2023-08-10 19:46 ` Zi Yan
2023-08-11 0:36 ` Yin, Fengwei
2023-08-11 1:04 ` Zi Yan
2023-08-11 5:34 ` Yin, Fengwei
2023-08-11 14:33 ` Zi Yan
2023-08-12 0:23 ` Yin, Fengwei
2023-08-30 11:41 ` Ryan Roberts
2023-08-31 0:14 ` Yin, Fengwei
2023-08-11 0:27 ` Yin, Fengwei
2023-08-15 21:32 ` Huang, Ying
2023-08-30 12:07 ` Ryan Roberts [this message]
2023-08-31 1:40 ` Huang, Ying
2023-08-31 7:57 ` David Hildenbrand
2023-08-31 8:02 ` Yin, Fengwei
2023-08-31 8:09 ` David Hildenbrand
2023-08-31 12:29 ` Matthew Wilcox
2023-09-01 14:40 ` David Hildenbrand
2023-08-31 17:15 ` Yang Shi
2023-09-01 16:13 ` Matthew Wilcox
2023-09-01 17:18 ` Yang Shi
2023-09-04 10:05 ` Ryan Roberts
2023-08-10 14:29 ` [PATCH v5 4/5] selftests/mm/cow: Generalize do_run_with_thp() helper Ryan Roberts
2023-08-10 14:29 ` [PATCH v5 5/5] selftests/mm/cow: Add large anon folio tests Ryan Roberts
2023-08-10 15:13 ` [PATCH v5 0/5] variable-order, large folios for anonymous memory Ryan Roberts
2023-08-16 8:11 ` Itaru Kitayama
2023-08-16 9:25 ` Yin, Fengwei
2023-08-16 11:57 ` Itaru Kitayama
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5c9ba378-2920-4892-bdf0-174e47d528b7@arm.com \
--to=ryan.roberts@arm.com \
--cc=akpm@linux-foundation.org \
--cc=anshuman.khandual@arm.com \
--cc=catalin.marinas@arm.com \
--cc=david@redhat.com \
--cc=fengwei.yin@intel.com \
--cc=itaru.kitayama@gmail.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mcgrof@kernel.org \
--cc=shy828301@gmail.com \
--cc=willy@infradead.org \
--cc=ying.huang@intel.com \
--cc=yuzhao@google.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox