Re: [PATCH 0/2] mm: swap: mTHP swap allocator base on swap cluster order

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Chris Li <chrisl@kernel.org>
To: "Huang, Ying" <ying.huang@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Kairui Song <kasong@tencent.com>,
	 Ryan Roberts <ryan.roberts@arm.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	 Barry Song <baohua@kernel.org>
Subject: Re: [PATCH 0/2] mm: swap: mTHP swap allocator base on swap cluster order
Date: Thu, 30 May 2024 14:44:33 -0700	[thread overview]
Message-ID: <CAF8kJuMc3sXKarq3hMPYGFfeqyo81Q63HrE0XtztK9uQkcZacA@mail.gmail.com> (raw)
In-Reply-To: <875xuw1062.fsf@yhuang6-desk2.ccr.corp.intel.com>

On Wed, May 29, 2024 at 7:54 PM Huang, Ying <ying.huang@intel.com> wrote:
>
> Chris Li <chrisl@kernel.org> writes:
>
> > Hi Ying,
> >
> > On Wed, May 29, 2024 at 1:57 AM Huang, Ying <ying.huang@intel.com> wrote:
> >>
> >> Chris Li <chrisl@kernel.org> writes:
> >>
> >> > I am spinning a new version for this series to address two issues
> >> > found in this series:
> >> >
> >> > 1) Oppo discovered a bug in the following line:
> >> > +               ci = si->cluster_info + tmp;
> >> > Should be "tmp / SWAPFILE_CLUSTER" instead of "tmp".
> >> > That is a serious bug but trivial to fix.
> >> >
> >> > 2) order 0 allocation currently blindly scans swap_map disregarding
> >> > the cluster->order.
> >>
> >> IIUC, now, we only scan swap_map[] only if
> >> !list_empty(&si->free_clusters) && !list_empty(&si->nonfull_clusters[order]).
> >> That is, if you doesn't run low swap free space, you will not do that.
> >
> > You can still swap space in order 0 clusters while order 4 runs out of
> > free_cluster
> > or nonfull_clusters[order]. For Android that is a common case.
>
> When we fail to allocate order 4, we will fallback to order 0.  Still
> don't need to scan swap_map[].  But after looking at your below reply, I
> realized that the swap space is almost full at most times in your cases.
> Then, it's possible that we run into scanning swap_map[].
> list_empty(&si->free_clusters) &&
> list_empty(&si->nonfull_clusters[order]) will become true, if we put too
> many clusters in si->percpu_cluster.  So, if we want to avoid to scan
> swap_map[], we can stop add clusters in si->percpu_cluster when swap
> space runs low.  And maybe take clusters out of si->percpu_cluster
> sometimes.

One idea after reading your reply. If we run out of the
nonfull_cluster[order], we should be able to use other cpu's
si->percpu_cluster[] as well. That is a very small win for Android,
because android does not have too many cpu. We are talking about a
handful of clusters, which might not justify the code complexity. It
does not change the behavior that order 0 can pollut higher order.

>
> Another issue is nonfull_cluster[order1] cannot be used for
> nonfull_cluster[order2].  In definition, we should not fail order 0
> allocation, we need to steal nonfull_cluster[order>0] for order 0
> allocation.  This can avoid to scan swap_map[] too.  This may be not
> perfect, but it is the simplest first step implementation.  You can
> optimize based on it further.

Yes, that is listed as the limitation of this cluster order approach.
Initially we need to support one order well first. We might choose
what order that is, 16K or 64K folio. 4K pages are too small, 2M pages
are too big. The sweet spot might be some there in between.  If we can
support one order well, we can demonstrate the value of the mTHP. We
can worry about other mix orders later.

Do you have any suggestions for how to prevent the order 0 polluting
the higher order cluster? If we allow that to happen, then it defeats
the goal of being able to allocate higher order swap entries. The
tricky question is we don't know how much swap space we should reserve
for each order. We can always break higher order clusters to lower
order, but can't do the reserves. The current patch series lets the
actual usage determine the percentage of the cluster for each order.
However that seems not enough for the test case Barry has. When the
app gets OOM kill that is where a large swing of order 0 swap will
show up and not enough higher order usage for the brief moment. The
order 0 swap entry will pollute the high order cluster. We are
currently debating a "knob" to be able to reserve a certain % of swap
space for a certain order. Those reservations will be guaranteed and
order 0 swap entry can't pollute them even when it runs out of swap
space. That can make the mTHP at least usable for the Android case.

Do you see another way to protect the high order cluster polluted by
lower order one?

>
> And, I checked your code again.  It appears that si->percpu_cluster may
> be put in si->nonfull_cluster[], then be used by another CPU.  Please
> check it.

Ah, good point. I think it does. Let me take a closer look.

Chris

Chris

next prev parent reply	other threads:[~2024-05-30 21:44 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-24 17:17 Chris Li
2024-05-24 17:17 ` [PATCH 1/2] mm: swap: swap cluster switch to double link list Chris Li
2024-05-28 16:23   ` Kairui Song
2024-05-28 22:27     ` Chris Li
2024-05-29  0:50       ` Chris Li
2024-05-29  8:46   ` Huang, Ying
2024-05-30 21:49     ` Chris Li
2024-05-31  2:03       ` Huang, Ying
2024-05-24 17:17 ` [PATCH 2/2] mm: swap: mTHP allocate swap entries from nonfull list Chris Li
2024-06-07 10:35   ` Ryan Roberts
2024-06-07 10:57     ` Ryan Roberts
2024-06-07 20:53       ` Chris Li
2024-06-07 20:52     ` Chris Li
2024-06-10 11:18       ` Ryan Roberts
2024-06-11  6:09         ` Chris Li
2024-05-28  3:07 ` [PATCH 0/2] mm: swap: mTHP swap allocator base on swap cluster order Barry Song
2024-05-28 21:04 ` Chris Li
2024-05-29  8:55   ` Huang, Ying
2024-05-30  1:13     ` Chris Li
2024-05-30  2:52       ` Huang, Ying
2024-05-30  8:08         ` Kairui Song
2024-05-30 18:31           ` Chris Li
2024-05-30 21:44         ` Chris Li [this message]
2024-05-31  2:35           ` Huang, Ying
2024-05-31 12:40             ` Kairui Song
2024-06-04  7:27               ` Huang, Ying
2024-06-05  7:40                 ` Chris Li
2024-06-05  7:30               ` Chris Li
2024-06-05  7:08             ` Chris Li
2024-06-06  1:55               ` Huang, Ying
2024-06-07 18:40                 ` Chris Li
2024-06-11  2:36                   ` Huang, Ying
2024-06-11  7:11                     ` Chris Li
2024-06-13  8:38                       ` Huang, Ying
2024-06-18  4:35                         ` Chris Li
2024-06-18  6:54                           ` Huang, Ying
2024-06-18  9:31                             ` Chris Li
2024-06-19  9:21                               ` Huang, Ying
2024-05-30  7:49   ` Barry Song
2024-06-07 10:49     ` Ryan Roberts
2024-06-07 18:57       ` Chris Li
2024-06-07  9:43 ` Ryan Roberts
2024-06-07 18:48   ` Chris Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAF8kJuMc3sXKarq3hMPYGFfeqyo81Q63HrE0XtztK9uQkcZacA@mail.gmail.com \
    --to=chrisl@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=baohua@kernel.org \
    --cc=kasong@tencent.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ryan.roberts@arm.com \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox