linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/4] mm/swap, memcg: Introduce swap tiers for cgroup based swap control
@ 2026-02-17  0:09 Youngjun Park
  2026-02-17  0:09 ` [PATCH v4 1/4] mm: swap: introduce swap tier infrastructure Youngjun Park
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Youngjun Park @ 2026-02-17  0:09 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Chris Li, linux-mm, Kairui Song, Kemeng Shi, Nhat Pham,
	Baoquan He, Barry Song, Johannes Weiner, Michal Hocko,
	Roman Gushchin, Shakeel Butt, Muchun Song, Michal Koutný,
	gunho.lee, taejoon.song, austin.kim, youngjun.park

This is the fourth version of the "Swap Tiers" concept.
Following Chris Li's suggestion to focus on small, mergeable
steps, this series covers the core tier infrastructure and
memcg-based tier assignment as a minimal usable feature set.
Further extensions are deferred to subsequent series.

Previous versions:
  RFC v3: https://lore.kernel.org/linux-mm/20260131125454.3187546-1-youngjun.park@lge.com/
  RFC v2: https://lore.kernel.org/linux-mm/20260126065242.1221862-1-youngjun.park@lge.com/
  RFC v1: https://lore.kernel.org/linux-mm/20251109124947.1101520-1-youngjun.park@lge.com/

Overview (Recap)
================
Swap Tiers enable grouping swap devices into named tiers based on
performance characteristics (e.g., NVMe, HDD, Network). This allows
faster devices to be dedicated to latency-sensitive workloads while
slower devices serve background tasks. The concept was suggested by
Chris Li.

Changes in v4
=================
- Simplified control flow to flatten indentation (Chris Li)
- Added CONFIG option for MAX_SWAPTIER with a small default of 4
  (Chris Li)
- Added memory.swap.tiers.effective read interface, following cpuset
  convention of splitting into configuration and effective files
  (Michal Koutný)
- cgroup docs refinement. (Michal Koutný)
- Reworked save/restore logic into a clearer "snapshot and rollback"
  model for improved readability and simpler control flow (Chris Li)
- Removed tier priority modification operation to reduce complexity;
  may be revisited in a future series
- Added tier name validation: only alphanumeric characters and
  underscores are allowed
- Fixed several edge case bugs
- Swap allocation logic improvements: integrating percpu global 
  cluster swap cache onto the swap device will be handled as 
  part of Kairui Song's ongoing work. Drop that logic on this patch.
- Rebased onto latest mm-new

Deferred and Future work:
- Per-tier swap_active_head to reduce contention across tiers when
  releasing swap entries on different tiers (Chris Li). This is an
  improvement to the swap_avail_head / swap_active_head (which must be done)
  and is not critical for the initial infrastructure.

- Round-robin rotation (Kairui) cleanup will be proposed after
  this series lands, as swap tiers can naturally abstract away
  round-robin behavior (round-robin is unnecessary when no
  equal-priority devices exist. possibly can disable it. and also can make round-robin
  priority selectable).

- BPF interfaces (Shakeel Butt). beyond memcg  
  are potential future extensions once the base infrastructure is
  established and real-world use cases are ((including, per-VMA, DAMON, etc.)).

Changes in RFC v3
=================
- Fixed swap_alloc_fast() tier eligibility check
- Fixed tier_mask restoration on error paths
- Fixed priority -1 tier deletion bug
- Fixed !CONFIG_MEMCG build failures
- Improved commit messages
- Fix improper error handling
- Fixed coding style violations
- Fixed tier deletion propagation to cgroups

Changes in RFC v2
=================
- Strict cgroup hierarchy compliance (LPC 2025 feedback)
- Percpu swap device cache to preserve fastpath performance
  (Kairui Song, Baoquan He)
- Simplified tier structure (Chris Li)
- Removed explicit "+" selection; default is all tiers, use "-"
  to exclude (Chris Li)
- Removed CONFIG_SWAP_TIER; now base kernel feature (Chris Li)
- Effective tier calculation moved to configuration time
  (swap.tiers write)
- Mixed operation support for "+" and "-" in
  /sys/kernel/mm/swap/tiers (Chris Li)
- Commit reorganization for clarity (Chris Li)
- Added tier priority modification support
- Added documentation for swap tiers concept and usage (Chris Li)

Real-world Results
==================
App preloading on our internal platform using NBD as a separate tier.

Without a separate swap tier:
- Cannot selectively avoid default flash swap, unable to reduce
  flash wear and lifespan issues.
- Cannot selectively assign NBD to specific apps that need it.

Result (cold launch vs. preloaded):
- Streaming App A: 13.17s → 4.18s (68% faster)
- Streaming App B: 5.60s → 1.12s (80% faster)
- E-commerce App C: 10.25s → 2.00s (80% faster)

Performance validation against baseline (no tiers configured) shows
negligible overhead (<1%) in kernel build and vm-scalability
benchmarks. Detailed results in RFC v2 cover letter.

Youngjun Park (4):
  mm: swap: introduce swap tier infrastructure
  mm: swap: associate swap devices with tiers
  mm: memcontrol: add interfaces for swap tier selection
  mm: swap: filter swap allocation by memcg tier mask

 Documentation/admin-guide/cgroup-v2.rst |  27 ++
 Documentation/mm/swap-tier.rst          | 159 +++++++++
 MAINTAINERS                             |   3 +
 include/linux/memcontrol.h              |   3 +-
 include/linux/swap.h                    |   1 +
 mm/Kconfig                              |  12 +
 mm/Makefile                             |   2 +-
 mm/memcontrol.c                         |  95 +++++
 mm/swap.h                               |   4 +
 mm/swap_state.c                         |  75 ++++
 mm/swap_tier.c                          | 451 ++++++++++++++++++++++++
 mm/swap_tier.h                          |  74 ++++
 mm/swapfile.c                           |  22 +-
 13 files changed, 922 insertions(+), 6 deletions(-)
 create mode 100644 Documentation/mm/swap-tier.rst
 create mode 100644 mm/swap_tier.c
 create mode 100644 mm/swap_tier.h

base-commit: 776250964cbaa49ebe6b8bb2870765cc89cece59
-- 
2.34.1



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-02-17 15:28 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-02-17  0:09 [PATCH v4 0/4] mm/swap, memcg: Introduce swap tiers for cgroup based swap control Youngjun Park
2026-02-17  0:09 ` [PATCH v4 1/4] mm: swap: introduce swap tier infrastructure Youngjun Park
2026-02-17 15:27   ` kernel test robot
2026-02-17  0:09 ` [PATCH v4 2/4] mm: swap: associate swap devices with tiers Youngjun Park
2026-02-17  0:09 ` [PATCH v4 3/4] mm: memcontrol: add interfaces for swap tier selection Youngjun Park
2026-02-17 12:18   ` kernel test robot
2026-02-17  0:09 ` [PATCH v4 4/4] mm: swap: filter swap allocation by memcg tier mask Youngjun Park

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox