linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [LSF/MM/BPF TOPIC] Swap status and roadmap discussion
@ 2026-02-21 10:50 Kairui Song
  0 siblings, 0 replies; only message in thread
From: Kairui Song @ 2026-02-21 10:50 UTC (permalink / raw)
  To: lsf-pc
  Cc: Kairui Song, Chris Li, YoungJun Park, Barry Song, Baoquan He,
	linux-mm, Nhat Pham, Johannes Weiner

Last year, we successfully cleaned up the swap subsystem using the swap
table design [1], and that's not the end of the story. Combined with
layered swap table, ghost swap as posted by Chris, YoungJun's swap tiering
[2] [3], and Nhat's idea of having a dynamic swap size [4], we can have a
flexible, feature-rich swap. And importantly, the overhead of both CPU and
memory will be minimal for all users in all scenarios, lower than the old
swap system. And every component is runtime optional, configurable, and
highly compatible with future features (e.g. I just noticed Baoquan's
swapops [5] which should fit well here. Swap table compaction based
on full list too).

We should be able to achieve a solution that users ranging from sub-GB
devices to TB-level servers will all benefit from.

Based on the swap table P4 RFC [6], we will achieve (see detail in that
series):
- 8 bytes per slot memory usage for plain swap.
  - And can be reduced to 3 or only 1 byte.
- 16 bytes per slot memory usage, when using ghost / virtual zswap.
  - 24 bytes at most for multi-layer.
  - And can be reduced too by simply using the same infrastructure above.
- Minimal code review or maintenance burden. All layers are using the same
  infrastructure to manage the metadata/allocation/synchronization, making
  all APIs and conventions consistent and easy to maintain.
- Every component is minimal, runtime optional and high-performance so
  existing users of ZRAM or high performance devices have literally zero
  overhead.
- The ghost / virtual swapfile has a dynamic or infinite size with no
  static data overhead.
- Migration and compaction are also easily supportable as both reverse
  mapping and reallocation are prepared.
- Highly compatible with YoungJun's swap tier, because everything is just a
  device [2] [3].
- Solves large-order swapout and minimum swap order requirements.
- The fast swapoff feature is also supported by just reading the swap entry
  into the ghost / vswap's swap cache.

And besides these, swap now has the opportunity for even further
optimizations, e.g. PG_drop for anon reclaim since swap now has a unified
convention; Reducing rmap lock contention as was once suggested by Barry
Song [7]. Growth of the static swap file can also be added later, so plain
swap on top of things like LVM can finally grow without causing memory
pressure.

And there are unsolved design decisions that need discussion, such as:
- Should we use swapon / swapoff on the virtual / ghost device? Or expose
  it in other ways, or make it on by default? Using the classical swapon /
  off provides huge flexibility; on by default is also doable and hides
  complexity.
- Should we expose special devices like /dev/xswap, or just use a dummy
  swap header file?
- How to, or should we report the usage of ghost / virtual swap devices as
  ordinary swap under /proc/swaps? We definitely need some way to report
  that.
- Is 64 bits really needed for reverse mapping? For the context, reverse
  mapping here is a swap entry recorded in a lower / physical device
  pointing to the ghost / virtual device.
- The swap device size is now just a number, to adjust that, we need an
  interface, and what kind of interface is the best choice? Or just
  make it dynamic (e.g. increase by 2M for every cluster allocated)?

Link: https://lore.kernel.org/all/CAMgjq7BvQ0ZXvyLGp2YP96+i+6COCBBJCYmjXHGBnfisCAb8VA@mail.gmail.com/
[1]
Link: https://lore.kernel.org/linux-mm/CAMgjq7BA_2-5iCvS-vp9ZEoG=1DwHWYuVZOuH8DWH9wzdoC00g@mail.gmail.com/
[2]
Link: https://lore.kernel.org/linux-mm/20260217000950.4015880-1-youngjun.park@lge.com/
[3]
Link: https://lore.kernel.org/linux-mm/20260208215839.87595-1-nphamcs@gmail.com/
[4]
Link: https://lore.kernel.org/linux-mm/aZiFvzlBJiYBUDre@MiWiFi-R3L-srv/ [5]
Link: https://lore.kernel.org/linux-mm/20260220-swap-table-p4-v1-0-104795d19815@tencent.com/
[6]
Link: https://lore.kernel.org/linux-mm/20250513084620.58231-1-21cnbao@gmail.com/
[7]


^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2026-02-21 10:50 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-02-21 10:50 [LSF/MM/BPF TOPIC] Swap status and roadmap discussion Kairui Song

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox