From: "Huang, Ying" <ying.huang@intel.com>
To: Zi Yan <ziy@nvidia.com>
Cc: <linux-mm@kvack.org>, <linux-kernel@vger.kernel.org>,
Ryan Roberts <ryan.roberts@arm.com>,
Andrew Morton <akpm@linux-foundation.org>,
"Matthew Wilcox (Oracle)" <willy@infradead.org>,
David Hildenbrand <david@redhat.com>,
"Yin, Fengwei" <fengwei.yin@intel.com>,
Yu Zhao <yuzhao@google.com>, Vlastimil Babka <vbabka@suse.cz>,
Johannes Weiner <hannes@cmpxchg.org>,
Baolin Wang <baolin.wang@linux.alibaba.com>,
Kemeng Shi <shikemeng@huaweicloud.com>,
Mel Gorman <mgorman@techsingularity.net>,
"Rohan Puri" <rohan.puri15@gmail.com>,
Mcgrof Chamberlain <mcgrof@kernel.org>,
"Adam Manzanares" <a.manzanares@samsung.com>,
John Hubbard <jhubbard@nvidia.com>
Subject: Re: [RFC PATCH 0/4] Enable >0 order folio memory compaction
Date: Tue, 10 Oct 2023 14:08:08 +0800 [thread overview]
Message-ID: <87r0m3ggc7.fsf@yhuang6-desk2.ccr.corp.intel.com> (raw)
In-Reply-To: <14089E95-251E-43A4-AF32-C9773723C810@nvidia.com> (Zi Yan's message of "Mon, 09 Oct 2023 09:43:38 -0400")
Something wrong with my mail box. Sorry, if you received duplicated
mail.
Zi Yan <ziy@nvidia.com> writes:
> On 9 Oct 2023, at 3:12, Huang, Ying wrote:
>
>> Hi, Zi,
>>
>> Thanks for your patch!
>>
>> Zi Yan <zi.yan@sent.com> writes:
>>
>>> From: Zi Yan <ziy@nvidia.com>
>>>
>>> Hi all,
>>>
>>> This patchset enables >0 order folio memory compaction, which is one of
>>> the prerequisitions for large folio support[1]. It is on top of
>>> mm-everything-2023-09-11-22-56.
>>>
>>> Overview
>>> ===
>>>
>>> To support >0 order folio compaction, the patchset changes how free pages used
>>> for migration are kept during compaction.
>>
>> migrate_pages() can split the large folio for allocation failure. So
>> the minimal implementation could be
>>
>> - allow to migrate large folios in compaction
>> - return -ENOMEM for order > 0 in compaction_alloc()
>>
>> The performance may be not desirable. But that may be a baseline for
>> further optimization.
>
> I would imagine it might cause a regression since compaction might gradually
> split high order folios in the system.
I may not call it a pure regression, since large folio can be migrated
during compaction with that, but it's possible that this hurts
performance.
Anyway, this can be a not-so-good minimal baseline.
> But I can move Patch 4 first to make this the baseline and see how
> system performance changes.
Thanks!
>>
>> And, if we can measure the performance for each step of optimization,
>> that will be even better.
>
> Do you have any benchmark in mind for the performance tests? vm-scalability?
I remember Mel Gorman has done some tests for defragmentation before.
But that's for order-0 pages.
>>> Free pages used to be split into
>>> order-0 pages that are post allocation processed (i.e., PageBuddy flag cleared,
>>> page order stored in page->private is zeroed, and page reference is set to 1).
>>> Now all free pages are kept in a MAX_ORDER+1 array of page lists based
>>> on their order without post allocation process. When migrate_pages() asks for
>>> a new page, one of the free pages, based on the requested page order, is
>>> then processed and given out.
>>>
>>>
>>> Optimizations
>>> ===
>>>
>>> 1. Free page split is added to increase migration success rate in case
>>> a source page does not have a matched free page in the free page lists.
>>> Free page merge is possible but not implemented, since existing
>>> PFN-based buddy page merge algorithm requires the identification of
>>> buddy pages, but free pages kept for memory compaction cannot have
>>> PageBuddy set to avoid confusing other PFN scanners.
>>>
>>> 2. Sort source pages in ascending order before migration is added to
>>
>> Trivial.
>>
>> s/ascending/descending/
>>
>>> reduce free page split. Otherwise, high order free pages might be
>>> prematurely split, causing undesired high order folio migration failures.
>>>
>>>
>>> TODOs
>>> ===
>>>
>>> 1. Refactor free page post allocation and free page preparation code so
>>> that compaction_alloc() and compaction_free() can call functions instead
>>> of hard coding.
>>>
>>> 2. One possible optimization is to allow migrate_pages() to continue
>>> even if get_new_folio() returns a NULL. In general, that means there is
>>> not enough memory. But in >0 order folio compaction case, that means
>>> there is no suitable free page at source page order. It might be better
>>> to skip that page and finish the rest of migration to achieve a better
>>> compaction result.
>>
>> We can split the source folio if get_new_folio() returns NULL. So, do
>> we really need this?
>
> It depends. The situation it can benefit is that when the system is going
> to allocate a high order free page and trigger a compaction, it is possible to
> get the high order free page by migrating a bunch of base pages instead of
> splitting a existing high order folio.
>
>>
>> In general, we may reconsider all further optimizations given splitting
>> is available already.
>
> In my mind, split should be avoided as much as possible.
If so, should we use "nosplit" logic in migrate_pages_batch() in some
situation?
> But it really depends
> on the actual situation, e.g., how much effort and cost the compaction wants
> to pay to get memory defragmented. If the system really wants to get a high
> order free page at any cost, split can be used without any issue. But applications
> might lose performance because existing large folios are split just to a
> new one.
Is it possible that splitting is desirable in some situation? For
example, allocate some large DMA buffers at the cost of large anonymous
folios?
> Like I said in the email, there are tons of optimizations and policies for us
> to explore. We can start with the bare minimum support (if no performance
> regression is observed, we can even start with split all high folios like you
> suggested) and add optimizations one by one.
Sound good to me! Thanks!
>>
>>> 3. Another possible optimization is to enable free page merge. It is
>>> possible that a to-be-migrated page causes free page split then fails to
>>> migrate eventually. We would lose a high order free page without free
>>> page merge function. But a way of identifying free pages for memory
>>> compaction is needed to reuse existing PFN-based buddy page merge.
>>>
>>> 4. The implemented >0 order folio compaction algorithm is quite naive
>>> and does not consider all possible situations. A better algorithm can
>>> improve compaction success rate.
>>>
>>>
>>> Feel free to give comments and ask questions.
>>>
>>> Thanks.
>>>
>>>
>>> [1] https://lore.kernel.org/linux-mm/f8d47176-03a8-99bf-a813-b5942830fd73@arm.com/
>>>
>>> Zi Yan (4):
>>> mm/compaction: add support for >0 order folio memory compaction.
>>> mm/compaction: optimize >0 order folio compaction with free page
>>> split.
>>> mm/compaction: optimize >0 order folio compaction by sorting source
>>> pages.
>>> mm/compaction: enable compacting >0 order folios.
>>>
>>> mm/compaction.c | 205 +++++++++++++++++++++++++++++++++++++++---------
>>> mm/internal.h | 7 +-
>>> 2 files changed, 176 insertions(+), 36 deletions(-)
--
Best Regards,
Huang, Ying
next prev parent reply other threads:[~2023-10-10 6:10 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-12 16:28 Zi Yan
2023-09-12 16:28 ` [RFC PATCH 1/4] mm/compaction: add support for " Zi Yan
2023-09-12 17:32 ` Johannes Weiner
2023-09-12 17:38 ` Zi Yan
2023-09-15 9:33 ` Baolin Wang
2023-09-18 17:06 ` Zi Yan
2023-10-10 8:07 ` Huang, Ying
2023-09-12 16:28 ` [RFC PATCH 2/4] mm/compaction: optimize >0 order folio compaction with free page split Zi Yan
2023-09-18 7:34 ` Baolin Wang
2023-09-18 17:20 ` Zi Yan
2023-09-20 8:15 ` Baolin Wang
2023-09-12 16:28 ` [RFC PATCH 3/4] mm/compaction: optimize >0 order folio compaction by sorting source pages Zi Yan
2023-09-12 17:56 ` Johannes Weiner
2023-09-12 20:31 ` Zi Yan
2023-09-12 16:28 ` [RFC PATCH 4/4] mm/compaction: enable compacting >0 order folios Zi Yan
2023-09-15 9:41 ` Baolin Wang
2023-09-18 17:17 ` Zi Yan
2023-09-20 14:44 ` kernel test robot
2023-09-21 0:55 ` [RFC PATCH 0/4] Enable >0 order folio memory compaction Luis Chamberlain
2023-09-21 1:16 ` Luis Chamberlain
2023-09-21 2:05 ` John Hubbard
2023-09-21 3:14 ` Luis Chamberlain
2023-09-21 15:56 ` Zi Yan
2023-10-02 12:32 ` Ryan Roberts
2023-10-09 13:24 ` Zi Yan
2023-10-09 14:10 ` Ryan Roberts
2023-10-09 15:42 ` Zi Yan
2023-10-09 15:52 ` Zi Yan
2023-10-10 10:00 ` Ryan Roberts
2023-10-09 7:12 ` Huang, Ying
2023-10-09 13:43 ` Zi Yan
2023-10-10 6:08 ` Huang, Ying [this message]
2023-10-10 16:48 ` Zi Yan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87r0m3ggc7.fsf@yhuang6-desk2.ccr.corp.intel.com \
--to=ying.huang@intel.com \
--cc=a.manzanares@samsung.com \
--cc=akpm@linux-foundation.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=david@redhat.com \
--cc=fengwei.yin@intel.com \
--cc=hannes@cmpxchg.org \
--cc=jhubbard@nvidia.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mcgrof@kernel.org \
--cc=mgorman@techsingularity.net \
--cc=rohan.puri15@gmail.com \
--cc=ryan.roberts@arm.com \
--cc=shikemeng@huaweicloud.com \
--cc=vbabka@suse.cz \
--cc=willy@infradead.org \
--cc=yuzhao@google.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox