linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Kairui Song <ryncsn@gmail.com>
To: linux-mm@kvack.org
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Chris Li <chrisl@kernel.org>, Barry Song <v-songbaohua@oppo.com>,
	Hugh Dickins <hughd@google.com>,
	Yosry Ahmed <yosryahmed@google.com>,
	"Huang, Ying" <ying.huang@linux.alibaba.com>,
	Baoquan He <bhe@redhat.com>, Nhat Pham <nphamcs@gmail.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Baolin Wang <baolin.wang@linux.alibaba.com>,
	Kalesh Singh <kaleshsingh@google.com>,
	Matthew Wilcox <willy@infradead.org>,
	linux-kernel@vger.kernel.org, Kairui Song <kasong@tencent.com>
Subject: [PATCH v3 0/7] mm, swap: remove swap slot cache
Date: Fri, 14 Mar 2025 00:59:28 +0800	[thread overview]
Message-ID: <20250313165935.63303-1-ryncsn@gmail.com> (raw)

From: Kairui Song <kasong@tencent.com>

Slot cache was initially introduced by commit 67afa38e012e ("mm/swap:
add cache for swap slots allocation") to reduce the lock contention
of si->lock.

Previous series "mm, swap: rework of swap allocator locks" [1] removed
swap slot cache for freeing path as freeing path no longer touches
si->lock in most cased. Allocation path also have slight to none
contention on si->lock since that series, but slot cache still helps
to reduce other overheads, like counters and the plist.

This series removes the slot cache from allocation path too, by using
the cluster as allocation fast path and also reduce other overheads.

Now slot cache is completely gone, the code is much simplified without
obvious feature or performance change, also clean up related workaround.
Also this should avoid other potential issues, e.g. the long pinning
of swap slots: swap slot cache pins swap slots with HAS_CACHE, causing
reclaim or allocation fail to use these slots on scanning.

The only behavior change is the swap device allocation rotation
mechanism, as explained in the patch "mm, swap: use percpu cluster
as allocation fast path".

Test results are looking good after deleting the swap slot cache:

- vm-scalability with: `usemem --init-time -O -y -x -R -31 1G`,
12G memory cgroup using simulated pmem as SWAP (32G pmem, 32 CPUs),
16 test runs for each case, measuring the total throughput:

                      Before (KB/s) (stdev)  After (KB/s) (stdev)
Random (4K):          424907.60 (24410.78)   414745.92  (34554.78)
Random (64K):         163308.82 (11635.72)   167314.50  (18434.99)
Sequential (4K, !-R): 6150056.79 (103205.90) 6321469.06 (115878.16)

- Build linux kernel with make -j96, using 4K folio with 1.5G memory
cgroup limit and 64K folio with 2G memory cgroup limit, on top of tmpfs,
12 test runs, measuring the system time:

                  Before (s) (stdev)  After (s) (stdev)
make -j96 (4K):   6445.69 (61.95)     6408.80 (69.46)
make -j96 (64K):  6841.71 (409.04)    6437.99 (435.55)

The performance is unchanged, slightly better in some cases.

[1] https://lore.kernel.org/linux-mm/20250113175732.48099-1-ryncsn@gmail.com/

---

V2: https://lore.kernel.org/linux-mm/20250224180212.22802-1-ryncsn@gmail.com/
Updates from V2:
- Make folio_alloc_swap() inline to fix build error [Stephen Rothwell]
- Flush the global percpu cluster cache on swapoff to prevent new swapon
  devices using the old invalid values. Based on:
  https://lore.kernel.org/linux-mm/CAMgjq7AkRmb5ote-VZErM_2UdEC575j9WcrstcQOypEb+T-DLA@mail.gmail.com/
- Minor update for patch 5/7: in slow path also try the local cluster
  first to avoid fragmentation. It's a intermediate patch change for easier
  testing if someone run into a bisect, the final code after the whole
  series applies is not changed.
- Need to call mem_cgroup_try_charge_swap even if swap allocation failed
  for cgroup events.
- Collect reviews and minor improvements [Baoquan He].

V1: https://lore.kernel.org/linux-mm/20250214175709.76029-1-ryncsn@gmail.com/
Updates from V1:
- Check the cluster with cluster_is_usable and cluster_is_empty in
  fast path too, improve performance and avoid fragmentation.
- Fix a build warning and error for !SWAP build reported by test bot.
- Global cluster array also record device for each order [Baoquan He]
- Adjust of comments and function name [Baoquan He]
- Collect Review-by [Baoquan He]
- Minor function style improvement [Matthew Wilcox]

Kairui Song (7):
  mm, swap: avoid reclaiming irrelevant swap cache
  mm, swap: drop the flag TTRS_DIRECT
  mm, swap: avoid redundant swap device pinning
  mm, swap: don't update the counter up-front
  mm, swap: use percpu cluster as allocation fast path
  mm, swap: remove swap slot cache
  mm, swap: simplify folio swap allocation

 include/linux/swap.h       |  22 +--
 include/linux/swap_slots.h |  28 ---
 mm/Makefile                |   2 +-
 mm/shmem.c                 |  21 +--
 mm/swap.h                  |   6 -
 mm/swap_slots.c            | 295 ------------------------------
 mm/swap_state.c            |  79 +--------
 mm/swapfile.c              | 355 ++++++++++++++++++++-----------------
 mm/vmscan.c                |  16 +-
 mm/zswap.c                 |   6 +
 10 files changed, 232 insertions(+), 598 deletions(-)
 delete mode 100644 include/linux/swap_slots.h
 delete mode 100644 mm/swap_slots.c

-- 
2.48.1



             reply	other threads:[~2025-03-13 17:01 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-13 16:59 Kairui Song [this message]
2025-03-13 16:59 ` [PATCH v3 1/7] mm, swap: avoid reclaiming irrelevant swap cache Kairui Song
2025-03-13 16:59 ` [PATCH v3 2/7] mm, swap: drop the flag TTRS_DIRECT Kairui Song
2025-03-13 16:59 ` [PATCH v3 3/7] mm, swap: avoid redundant swap device pinning Kairui Song
2025-03-13 16:59 ` [PATCH v3 4/7] mm, swap: don't update the counter up-front Kairui Song
2025-03-13 16:59 ` [PATCH v3 5/7] mm, swap: use percpu cluster as allocation fast path Kairui Song
2025-03-13 16:59 ` [PATCH v3 6/7] mm, swap: remove swap slot cache Kairui Song
2025-04-28 13:52   ` Heiko Carstens
2025-04-28 15:31     ` Kairui Song
2025-04-29  7:31       ` Heiko Carstens
2025-04-29  9:28         ` Kairui Song
2025-03-13 16:59 ` [PATCH v3 7/7] mm, swap: simplify folio swap allocation Kairui Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250313165935.63303-1-ryncsn@gmail.com \
    --to=ryncsn@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=bhe@redhat.com \
    --cc=chrisl@kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=kaleshsingh@google.com \
    --cc=kasong@tencent.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nphamcs@gmail.com \
    --cc=v-songbaohua@oppo.com \
    --cc=willy@infradead.org \
    --cc=ying.huang@linux.alibaba.com \
    --cc=yosryahmed@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox