linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Youngjun Park <youngjun.park@lge.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: "Chris Li" <chrisl@kernel.org>,
	linux-mm@kvack.org, "Kairui Song" <kasong@tencent.com>,
	"Kemeng Shi" <shikemeng@huaweicloud.com>,
	"Nhat Pham" <nphamcs@gmail.com>, "Baoquan He" <bhe@redhat.com>,
	"Barry Song" <baohua@kernel.org>,
	"Johannes Weiner" <hannes@cmpxchg.org>,
	"Michal Hocko" <mhocko@kernel.org>,
	"Roman Gushchin" <roman.gushchin@linux.dev>,
	"Shakeel Butt" <shakeel.butt@linux.dev>,
	"Muchun Song" <muchun.song@linux.dev>,
	"Michal Koutný" <mkoutny@suse.com>,
	gunho.lee@lge.com, taejoon.song@lge.com, austin.kim@lge.com,
	youngjun.park@lge.com
Subject: [PATCH v4 0/4] mm/swap, memcg: Introduce swap tiers for cgroup based swap control
Date: Tue, 17 Feb 2026 09:09:46 +0900	[thread overview]
Message-ID: <20260217000950.4015880-1-youngjun.park@lge.com> (raw)

This is the fourth version of the "Swap Tiers" concept.
Following Chris Li's suggestion to focus on small, mergeable
steps, this series covers the core tier infrastructure and
memcg-based tier assignment as a minimal usable feature set.
Further extensions are deferred to subsequent series.

Previous versions:
  RFC v3: https://lore.kernel.org/linux-mm/20260131125454.3187546-1-youngjun.park@lge.com/
  RFC v2: https://lore.kernel.org/linux-mm/20260126065242.1221862-1-youngjun.park@lge.com/
  RFC v1: https://lore.kernel.org/linux-mm/20251109124947.1101520-1-youngjun.park@lge.com/

Overview (Recap)
================
Swap Tiers enable grouping swap devices into named tiers based on
performance characteristics (e.g., NVMe, HDD, Network). This allows
faster devices to be dedicated to latency-sensitive workloads while
slower devices serve background tasks. The concept was suggested by
Chris Li.

Changes in v4
=================
- Simplified control flow to flatten indentation (Chris Li)
- Added CONFIG option for MAX_SWAPTIER with a small default of 4
  (Chris Li)
- Added memory.swap.tiers.effective read interface, following cpuset
  convention of splitting into configuration and effective files
  (Michal Koutný)
- cgroup docs refinement. (Michal Koutný)
- Reworked save/restore logic into a clearer "snapshot and rollback"
  model for improved readability and simpler control flow (Chris Li)
- Removed tier priority modification operation to reduce complexity;
  may be revisited in a future series
- Added tier name validation: only alphanumeric characters and
  underscores are allowed
- Fixed several edge case bugs
- Swap allocation logic improvements: integrating percpu global 
  cluster swap cache onto the swap device will be handled as 
  part of Kairui Song's ongoing work. Drop that logic on this patch.
- Rebased onto latest mm-new

Deferred and Future work:
- Per-tier swap_active_head to reduce contention across tiers when
  releasing swap entries on different tiers (Chris Li). This is an
  improvement to the swap_avail_head / swap_active_head (which must be done)
  and is not critical for the initial infrastructure.

- Round-robin rotation (Kairui) cleanup will be proposed after
  this series lands, as swap tiers can naturally abstract away
  round-robin behavior (round-robin is unnecessary when no
  equal-priority devices exist. possibly can disable it. and also can make round-robin
  priority selectable).

- BPF interfaces (Shakeel Butt). beyond memcg  
  are potential future extensions once the base infrastructure is
  established and real-world use cases are ((including, per-VMA, DAMON, etc.)).

Changes in RFC v3
=================
- Fixed swap_alloc_fast() tier eligibility check
- Fixed tier_mask restoration on error paths
- Fixed priority -1 tier deletion bug
- Fixed !CONFIG_MEMCG build failures
- Improved commit messages
- Fix improper error handling
- Fixed coding style violations
- Fixed tier deletion propagation to cgroups

Changes in RFC v2
=================
- Strict cgroup hierarchy compliance (LPC 2025 feedback)
- Percpu swap device cache to preserve fastpath performance
  (Kairui Song, Baoquan He)
- Simplified tier structure (Chris Li)
- Removed explicit "+" selection; default is all tiers, use "-"
  to exclude (Chris Li)
- Removed CONFIG_SWAP_TIER; now base kernel feature (Chris Li)
- Effective tier calculation moved to configuration time
  (swap.tiers write)
- Mixed operation support for "+" and "-" in
  /sys/kernel/mm/swap/tiers (Chris Li)
- Commit reorganization for clarity (Chris Li)
- Added tier priority modification support
- Added documentation for swap tiers concept and usage (Chris Li)

Real-world Results
==================
App preloading on our internal platform using NBD as a separate tier.

Without a separate swap tier:
- Cannot selectively avoid default flash swap, unable to reduce
  flash wear and lifespan issues.
- Cannot selectively assign NBD to specific apps that need it.

Result (cold launch vs. preloaded):
- Streaming App A: 13.17s → 4.18s (68% faster)
- Streaming App B: 5.60s → 1.12s (80% faster)
- E-commerce App C: 10.25s → 2.00s (80% faster)

Performance validation against baseline (no tiers configured) shows
negligible overhead (<1%) in kernel build and vm-scalability
benchmarks. Detailed results in RFC v2 cover letter.

Youngjun Park (4):
  mm: swap: introduce swap tier infrastructure
  mm: swap: associate swap devices with tiers
  mm: memcontrol: add interfaces for swap tier selection
  mm: swap: filter swap allocation by memcg tier mask

 Documentation/admin-guide/cgroup-v2.rst |  27 ++
 Documentation/mm/swap-tier.rst          | 159 +++++++++
 MAINTAINERS                             |   3 +
 include/linux/memcontrol.h              |   3 +-
 include/linux/swap.h                    |   1 +
 mm/Kconfig                              |  12 +
 mm/Makefile                             |   2 +-
 mm/memcontrol.c                         |  95 +++++
 mm/swap.h                               |   4 +
 mm/swap_state.c                         |  75 ++++
 mm/swap_tier.c                          | 451 ++++++++++++++++++++++++
 mm/swap_tier.h                          |  74 ++++
 mm/swapfile.c                           |  22 +-
 13 files changed, 922 insertions(+), 6 deletions(-)
 create mode 100644 Documentation/mm/swap-tier.rst
 create mode 100644 mm/swap_tier.c
 create mode 100644 mm/swap_tier.h

base-commit: 776250964cbaa49ebe6b8bb2870765cc89cece59
-- 
2.34.1



             reply	other threads:[~2026-02-17  0:10 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-17  0:09 Youngjun Park [this message]
2026-02-17  0:09 ` [PATCH v4 1/4] mm: swap: introduce swap tier infrastructure Youngjun Park
2026-02-17 15:27   ` kernel test robot
2026-02-17  0:09 ` [PATCH v4 2/4] mm: swap: associate swap devices with tiers Youngjun Park
2026-02-17  0:09 ` [PATCH v4 3/4] mm: memcontrol: add interfaces for swap tier selection Youngjun Park
2026-02-17 12:18   ` kernel test robot
2026-02-17  0:09 ` [PATCH v4 4/4] mm: swap: filter swap allocation by memcg tier mask Youngjun Park

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260217000950.4015880-1-youngjun.park@lge.com \
    --to=youngjun.park@lge.com \
    --cc=akpm@linux-foundation.org \
    --cc=austin.kim@lge.com \
    --cc=baohua@kernel.org \
    --cc=bhe@redhat.com \
    --cc=chrisl@kernel.org \
    --cc=gunho.lee@lge.com \
    --cc=hannes@cmpxchg.org \
    --cc=kasong@tencent.com \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=mkoutny@suse.com \
    --cc=muchun.song@linux.dev \
    --cc=nphamcs@gmail.com \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeel.butt@linux.dev \
    --cc=shikemeng@huaweicloud.com \
    --cc=taejoon.song@lge.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox