Re: [PATCH 0/2] mm: swap: mTHP swap allocator base on swap cluster order

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: "Huang, Ying" <ying.huang@intel.com>
To: Chris Li <chrisl@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	 Kairui Song <kasong@tencent.com>,
	 Ryan Roberts <ryan.roberts@arm.com>,
	linux-kernel@vger.kernel.org,  linux-mm@kvack.org,
	 Barry Song <baohua@kernel.org>
Subject: Re: [PATCH 0/2] mm: swap: mTHP swap allocator base on swap cluster order
Date: Wed, 19 Jun 2024 17:21:56 +0800	[thread overview]
Message-ID: <87h6dp479n.fsf@yhuang6-desk2.ccr.corp.intel.com> (raw)
In-Reply-To: <CAF8kJuOfYMiD-aEhLa9i+oxAtasDcPhFb6__i6QRB2dGO1Lhcg@mail.gmail.com> (Chris Li's message of "Tue, 18 Jun 2024 02:31:58 -0700")

Chris Li <chrisl@kernel.org> writes:

> On Mon, Jun 17, 2024 at 11:56 PM Huang, Ying <ying.huang@intel.com> wrote:
>>
>> Chris Li <chrisl@kernel.org> writes:
>>
>> > That is in general true with all kernel development regardless of
>> > using options or not. If there is a bug in my patch, I will need to
>> > debug and fix it or the patch might be reverted.
>> >
>> > I don't see that as a reason to take the option path or not. The
>> > option just means the user taking this option will need to understand
>> > the trade off and accept the defined behavior of that option.
>>
>> User configuration knobs are not forbidden for Linux kernel.  But we are
>> more careful about them because they will introduce ABI which we need to
>> maintain forever.  And they are hard to be used for users.  Optimizing
>> automatically is generally the better solution.  So, I suggest you to
>> think more about the automatically solution before diving into a new
>> option.
>
> I did, see my reply. Right now there are just no other options.
>
>>
>> >>
>> >> >> So, I prefer the transparent methods.  Just like THP vs. hugetlbfs.
>> >> >
>> >> > Me too. I prefer transparent over reservation if it can achieve the
>> >> > same goal. Do we have a fully transparent method spec out? How to
>> >> > achieve fully transparent and also avoid fragmentation caused by mix
>> >> > order allocation/free?
>> >> >
>> >> > Keep in mind that we are still in the early stage of the mTHP swap
>> >> > development, I can have the reservation patch relatively easily. If
>> >> > you come up with a better transparent method patch which can achieve
>> >> > the same goal later, we can use it instead.
>> >>
>> >> Because we are still in the early stage, I think that we should try to
>> >> improve transparent solution firstly.  Personally, what I don't like is
>> >> that we don't work on the transparent solution because we have the
>> >> reservation solution.
>> >
>> > Do you have a road map or the design for the transparent solution you can share?
>> > I am interested to know what is the short term step(e.g. a month)  in
>> > this transparent solution you have in mind, so we can compare the
>> > different approaches. I can't reason much just by the name
>> > "transparent solution" itself. Need more technical details.
>> >
>> > Right now we have a clear usage case we want to support, the swap
>> > in/out mTHP with bigger zsmalloc buffers. We can start with the
>> > limited usage case first then move to more general ones.
>>
>> TBH, This is what I don't like.  It appears that you refuse to think
>> about the transparent (or automatic) solution.
>
> Actually, that is not true, you make the wrong assumption about what I
> have considered. I want to find out what you have in mind to compare
> the near term solutions.

Sorry about my wrong assumption.

> In my recent LSF slide I already list 3 options to address this
> fragmentation problem.
> From easy to hard:
> 1) Assign cluster an order on allocation and remember the cluster
> order. (short term).
> That is this patch series
> 2) Buddy allocation on the swap entry (longer term)
> 3) Folio write out compound discontinuous swap entry. (ultimate)
>
> I also considered 4), which I did not put into the slide, because it
> is less effective than 3)
> 4) migrating the swap entries, which require scan page table entry.
> I briefly mentioned it during the session.

Or you need something like a rmap, that isn't easy.

> 3) should might qualify as your transparent solution. It is just much
> harder to implement.
> Even when we have 3), having some form of 1) can be beneficial as
> well. (less IO count, no indirect layer of swap offset).
>
>>
>> I haven't thought about them thoroughly, but at least we may think about
>>
>> - promoting low order non-full cluster when we find a free high order
>>   swap entries.
>>
>> - stealing a low order non-full cluster with low usage count for
>>   high-order allocation.
>
> Now we are talking.
> These two above fall well within 2) the buddy allocators
> But the buddy allocator will not be able to address all fragmentation
> issues, due to the allocator not being controlled the life cycle of
> the swap entry.
> It will not help Barry's zsmalloc usage case much because android
> likes to keep the swapfile full. I can already see that.

I think that buddy-like allocator (not exactly buddy algorithm) will
help fragmentation.  And it will help more users because it works
automatically.

I don't think they are too hard to be implemented.  We can try to find
some simple solution firstly.  So, I think that we don't need to push
them to long term.  At least, they can be done before introducing
high-order cluster reservation ABI.  Then, we can evaluate the benefit
and overhead of reservation ABI.

>> - freeing more swap entries when swap devices become fragmented.
>
> That requires a scan page table to free the swap entry, basically 4).

No.  You can just scan the page table of current process in
do_swap_page() and try to swap-in and free more swap entries.  That
doesn't work well for the shared pages.  However, I think that it can
help quite some workloads.

> It is all about investment and return. 1) is relatively easy to
> implement and with good improvement and return.

[snip]

--
Best Regards,
Huang, Ying

next prev parent reply	other threads:[~2024-06-19  9:23 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-24 17:17 Chris Li
2024-05-24 17:17 ` [PATCH 1/2] mm: swap: swap cluster switch to double link list Chris Li
2024-05-28 16:23   ` Kairui Song
2024-05-28 22:27     ` Chris Li
2024-05-29  0:50       ` Chris Li
2024-05-29  8:46   ` Huang, Ying
2024-05-30 21:49     ` Chris Li
2024-05-31  2:03       ` Huang, Ying
2024-05-24 17:17 ` [PATCH 2/2] mm: swap: mTHP allocate swap entries from nonfull list Chris Li
2024-06-07 10:35   ` Ryan Roberts
2024-06-07 10:57     ` Ryan Roberts
2024-06-07 20:53       ` Chris Li
2024-06-07 20:52     ` Chris Li
2024-06-10 11:18       ` Ryan Roberts
2024-06-11  6:09         ` Chris Li
2024-05-28  3:07 ` [PATCH 0/2] mm: swap: mTHP swap allocator base on swap cluster order Barry Song
2024-05-28 21:04 ` Chris Li
2024-05-29  8:55   ` Huang, Ying
2024-05-30  1:13     ` Chris Li
2024-05-30  2:52       ` Huang, Ying
2024-05-30  8:08         ` Kairui Song
2024-05-30 18:31           ` Chris Li
2024-05-30 21:44         ` Chris Li
2024-05-31  2:35           ` Huang, Ying
2024-05-31 12:40             ` Kairui Song
2024-06-04  7:27               ` Huang, Ying
2024-06-05  7:40                 ` Chris Li
2024-06-05  7:30               ` Chris Li
2024-06-05  7:08             ` Chris Li
2024-06-06  1:55               ` Huang, Ying
2024-06-07 18:40                 ` Chris Li
2024-06-11  2:36                   ` Huang, Ying
2024-06-11  7:11                     ` Chris Li
2024-06-13  8:38                       ` Huang, Ying
2024-06-18  4:35                         ` Chris Li
2024-06-18  6:54                           ` Huang, Ying
2024-06-18  9:31                             ` Chris Li
2024-06-19  9:21                               ` Huang, Ying [this message]
2024-05-30  7:49   ` Barry Song
2024-06-07 10:49     ` Ryan Roberts
2024-06-07 18:57       ` Chris Li
2024-06-07  9:43 ` Ryan Roberts
2024-06-07 18:48   ` Chris Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87h6dp479n.fsf@yhuang6-desk2.ccr.corp.intel.com \
    --to=ying.huang@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=baohua@kernel.org \
    --cc=chrisl@kernel.org \
    --cc=kasong@tencent.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ryan.roberts@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox