linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Ryan Roberts <ryan.roberts@arm.com>
To: Chris Li <chrisl@kernel.org>, Andrew Morton <akpm@linux-foundation.org>
Cc: Kairui Song <kasong@tencent.com>,
	"Huang, Ying" <ying.huang@intel.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Barry Song <baohua@kernel.org>
Subject: Re: [PATCH 0/2] mm: swap: mTHP swap allocator base on swap cluster order
Date: Fri, 7 Jun 2024 10:43:15 +0100	[thread overview]
Message-ID: <968fec1a-9a54-4b2d-a54c-653d84393c82@arm.com> (raw)
In-Reply-To: <20240524-swap-allocator-v1-0-47861b423b26@kernel.org>

Sorry I'm late to the discussion - I've been out for the last 3.5 weeks and just
getting through my mail now...


On 24/05/2024 18:17, Chris Li wrote:
> This is the short term solutiolns "swap cluster order" listed
> in my "Swap Abstraction" discussion slice 8 in the recent
> LSF/MM conference.

I've read the article on lwn and look forward to watching the video once
available. The longer term plans look interesting.

> 
> When commit 845982eb264bc "mm: swap: allow storage of all mTHP
> orders" is introduced, it only allocates the mTHP swap entries
> from new empty cluster list. That works well for PMD size THP,
> but it has a serius fragmentation issue reported by Barry.

Yes, that was a deliberate initial approach to be conservative, just like the
original PMD-size THP support. I'm glad to see work to improve the situation!

> 
> https://lore.kernel.org/all/CAGsJ_4zAcJkuW016Cfi6wicRr8N9X+GJJhgMQdSMp+Ah+NSgNQ@mail.gmail.com/
> 
> The mTHP allocation failure rate raises to almost 100% after a few
> hours in Barry's test run.
> 
> The reason is that all the empty cluster has been exhausted while
> there are planty of free swap entries to in the cluster that is
> not 100% free.
> 
> Address this by remember the swap allocation order in the cluster.
> Keep track of the per order non full cluster list for later allocation.

I don't immediately see how this helps because memory is swapped back in
per-page (currently), so just because a given cluster was initially filled with
entries of a given order, doesn't mean that those entries are freed in atomic
units; only specific pages could have been swapped back in, meaning the holes
are not of the required order. Additionally, scanning could lead to order-0
pages being populated in random places.

My naive assumption was that the obvious way to solve this problem in the short
term would be to extend the scanning logic to be able to scan for an arbitrary
order. That way you could find an allocation of the required order in any of the
clusters, even a cluster that was not originally allocated for the required order.

I guess I should read your patches to understand exactly what you are doing
rather than making assumptions...

Thanks,
Ryan

> 
> This greatly improve the sucess rate of the mTHP swap allocation.
> While I am still waiting for Barry's test result. I paste Kairui's test
> result here:
> 
> I'm able to reproduce such an issue with a simple script (enabling all order of mthp):
> 
> modprobe brd rd_nr=1 rd_size=$(( 10 * 1024 * 1024))
> swapoff -a
> mkswap /dev/ram0
> swapon /dev/ram0
> 
> rmdir /sys/fs/cgroup/benchmark
> mkdir -p /sys/fs/cgroup/benchmark
> cd /sys/fs/cgroup/benchmark
> echo 8G > memory.max
> echo $$ > cgroup.procs
> 
> memcached -u nobody -m 16384 -s /tmp/memcached.socket -a 0766 -t 32 -B binary &
> 
> /usr/local/bin/memtier_benchmark -S /tmp/memcached.socket \
>         -P memcache_binary -n allkeys --key-minimum=1 \
>         --key-maximum=18000000 --key-pattern=P:P -c 1 -t 32 \
>         --ratio 1:0 --pipeline 8 -d 1024
> 
> Before:
> Totals      48805.63         0.00         0.00         5.26045         1.19100        38.91100        59.64700     51063.98
> After:
> Totals      71098.84         0.00         0.00         3.60585         0.71100        26.36700        39.16700     74388.74
> 
> And the fallback ratio dropped by a lot:
> Before:
> hugepages-32kB/stats/anon_swpout_fallback:15997
> hugepages-32kB/stats/anon_swpout:18712
> hugepages-512kB/stats/anon_swpout_fallback:192
> hugepages-512kB/stats/anon_swpout:0
> hugepages-2048kB/stats/anon_swpout_fallback:2
> hugepages-2048kB/stats/anon_swpout:0
> hugepages-1024kB/stats/anon_swpout_fallback:0
> hugepages-1024kB/stats/anon_swpout:0
> hugepages-64kB/stats/anon_swpout_fallback:18246
> hugepages-64kB/stats/anon_swpout:17644
> hugepages-16kB/stats/anon_swpout_fallback:13701
> hugepages-16kB/stats/anon_swpout:18234
> hugepages-256kB/stats/anon_swpout_fallback:8642
> hugepages-256kB/stats/anon_swpout:93
> hugepages-128kB/stats/anon_swpout_fallback:21497
> hugepages-128kB/stats/anon_swpout:7596
> 
> (Still collecting more data, the success swpout was mostly done early, then the fallback began to increase, nearly 100% failure rate)
> 
> After:
> hugepages-32kB/stats/swpout:34445
> hugepages-32kB/stats/swpout_fallback:0
> hugepages-512kB/stats/swpout:1
> hugepages-512kB/stats/swpout_fallback:134
> hugepages-2048kB/stats/swpout:1
> hugepages-2048kB/stats/swpout_fallback:1
> hugepages-1024kB/stats/swpout:6
> hugepages-1024kB/stats/swpout_fallback:0
> hugepages-64kB/stats/swpout:35495
> hugepages-64kB/stats/swpout_fallback:0
> hugepages-16kB/stats/swpout:32441
> hugepages-16kB/stats/swpout_fallback:0
> hugepages-256kB/stats/swpout:2223
> hugepages-256kB/stats/swpout_fallback:6278
> hugepages-128kB/stats/swpout:29136
> hugepages-128kB/stats/swpout_fallback:52
> 
> Reported-by: Barry Song <21cnbao@gmail.com>
> Tested-by: Kairui Song <kasong@tencent.com>
> Signed-off-by: Chris Li <chrisl@kernel.org>
> ---
> Chris Li (2):
>       mm: swap: swap cluster switch to double link list
>       mm: swap: mTHP allocate swap entries from nonfull list
> 
>  include/linux/swap.h |  18 ++--
>  mm/swapfile.c        | 252 +++++++++++++++++----------------------------------
>  2 files changed, 93 insertions(+), 177 deletions(-)
> ---
> base-commit: c65920c76a977c2b73c3a8b03b4c0c00cc1285ed
> change-id: 20240523-swap-allocator-1534c480ece4
> 
> Best regards,



  parent reply	other threads:[~2024-06-07  9:43 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-24 17:17 Chris Li
2024-05-24 17:17 ` [PATCH 1/2] mm: swap: swap cluster switch to double link list Chris Li
2024-05-28 16:23   ` Kairui Song
2024-05-28 22:27     ` Chris Li
2024-05-29  0:50       ` Chris Li
2024-05-29  8:46   ` Huang, Ying
2024-05-30 21:49     ` Chris Li
2024-05-31  2:03       ` Huang, Ying
2024-05-24 17:17 ` [PATCH 2/2] mm: swap: mTHP allocate swap entries from nonfull list Chris Li
2024-06-07 10:35   ` Ryan Roberts
2024-06-07 10:57     ` Ryan Roberts
2024-06-07 20:53       ` Chris Li
2024-06-07 20:52     ` Chris Li
2024-06-10 11:18       ` Ryan Roberts
2024-06-11  6:09         ` Chris Li
2024-05-28  3:07 ` [PATCH 0/2] mm: swap: mTHP swap allocator base on swap cluster order Barry Song
2024-05-28 21:04 ` Chris Li
2024-05-29  8:55   ` Huang, Ying
2024-05-30  1:13     ` Chris Li
2024-05-30  2:52       ` Huang, Ying
2024-05-30  8:08         ` Kairui Song
2024-05-30 18:31           ` Chris Li
2024-05-30 21:44         ` Chris Li
2024-05-31  2:35           ` Huang, Ying
2024-05-31 12:40             ` Kairui Song
2024-06-04  7:27               ` Huang, Ying
2024-06-05  7:40                 ` Chris Li
2024-06-05  7:30               ` Chris Li
2024-06-05  7:08             ` Chris Li
2024-06-06  1:55               ` Huang, Ying
2024-06-07 18:40                 ` Chris Li
2024-06-11  2:36                   ` Huang, Ying
2024-06-11  7:11                     ` Chris Li
2024-06-13  8:38                       ` Huang, Ying
2024-06-18  4:35                         ` Chris Li
2024-06-18  6:54                           ` Huang, Ying
2024-06-18  9:31                             ` Chris Li
2024-06-19  9:21                               ` Huang, Ying
2024-05-30  7:49   ` Barry Song
2024-06-07 10:49     ` Ryan Roberts
2024-06-07 18:57       ` Chris Li
2024-06-07  9:43 ` Ryan Roberts [this message]
2024-06-07 18:48   ` Chris Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=968fec1a-9a54-4b2d-a54c-653d84393c82@arm.com \
    --to=ryan.roberts@arm.com \
    --cc=akpm@linux-foundation.org \
    --cc=baohua@kernel.org \
    --cc=chrisl@kernel.org \
    --cc=kasong@tencent.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox