linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH mm-unstable v1 0/8] mm: multi-gen LRU: memcg LRU
@ 2022-12-01 22:39 Yu Zhao
  2022-12-01 22:39 ` [PATCH mm-unstable v1 1/8] mm: multi-gen LRU: rename lru_gen_struct to lru_gen_folio Yu Zhao
                   ` (8 more replies)
  0 siblings, 9 replies; 13+ messages in thread
From: Yu Zhao @ 2022-12-01 22:39 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Jonathan Corbet, Michael Larabel, Michal Hocko,
	Mike Rapoport, Roman Gushchin, Suren Baghdasaryan, linux-mm,
	linux-kernel, linux-mm, Yu Zhao

An memcg LRU is a per-node LRU of memcgs. It is also an LRU of LRUs,
since each node and memcg combination has an LRU of folios (see
mem_cgroup_lruvec()).

Its goal is to improve the scalability of global reclaim, which is
critical to systemwide memory overcommit in data centers. Note that
memcg reclaim is currently out of scope.

Its memory bloat is a pointer to each LRU vector and negligible to
each node. In terms of traversing memcgs during global reclaim, it
improves the best-case complexity from O(n) to O(1) and does not
affect the worst-case complexity O(n). Therefore, on average, it has
a sublinear complexity in contrast to the current linear complexity.

The basic structure of an memcg LRU can be understood by an analogy
to the active/inactive LRU (of folios):
1. It has the young and the old (generations);
2. Its linked lists have the head and the tail;
3. The increment of max_seq triggers promotion;
4. Other events, e.g., offlining an memcg, triggers similar
   operations.

In terms of global reclaim, it has two distinct features:
1. Sharding, which allows each thread to start at a random memcg (in
   the old generation) and improves parallelism;
2. Eventual fairness, which allows direct reclaim to bail out and
   reduces latency without affecting fairness over some time.

The commit message in patch 6 details the workflow:
https://lore.kernel.org/r/20221201223923.873696-7-yuzhao@google.com/

The following is a simple test to quickly verify its effectiveness.
More benchmarks are coming soon.

  Test design:
  1. Create multiple memcgs.
  2. Each memcg contains a job (fio).
  3. All jobs access the same amount of memory randomly.
  4. The system does not experience global memory pressure.
  5. Periodically write to the root memory.reclaim.

  Desired outcome:
  1. All memcgs have similar pgsteal, i.e.,
     stddev(pgsteal)/mean(pgsteal) is close to 0%.
  2. The total pgsteal is close to the total requested through
     memory.reclaim, i.e., sum(pgsteal)/sum(requested) is close to
     100%.

  Actual outcome [1]:
             stddev(pgsteal)/mean(pgsteal) sum(pgsteal)/sum(requested)
  MGLRU off  75%                           425%
  MGLRU on   20%                           95%

  ####################################################################
  MEMCGS=128

  for ((memcg = 0; memcg < $MEMCGS; memcg++)); do
      mkdir /sys/fs/cgroup/memcg$memcg
  done

  start() {
      echo $BASHPID > /sys/fs/cgroup/memcg$memcg/cgroup.procs

      fio -name=memcg$memcg --numjobs=1 --ioengine=mmap \
          --filename=/dev/zero --size=1920M --rw=randrw \
          --rate=64m,64m --random_distribution=random \
          --fadvise_hint=0 --time_based --runtime=10h \
          --group_reporting --minimal
  }

  for ((memcg = 0; memcg < $MEMCGS; memcg++)); do
      start &
  done

  sleep 600

  for ((i = 0; i < 600; i++)); do
      echo 256m >/sys/fs/cgroup/memory.reclaim
      sleep 6
  done

  for ((memcg = 0; memcg < $MEMCGS; memcg++)); do
      grep "pgsteal " /sys/fs/cgroup/memcg$memcg/memory.stat
  done
  ####################################################################

[1]: This was obtained from running the above script (touches less
     than 256GB memory) on an EPYC 7B13 with 512GB DRAM for over an
     hour.

Yu Zhao (8):
  mm: multi-gen LRU: rename lru_gen_struct to lru_gen_folio
  mm: multi-gen LRU: rename lrugen->lists[] to lrugen->folios[]
  mm: multi-gen LRU: remove eviction fairness safeguard
  mm: multi-gen LRU: remove aging fairness safeguard
  mm: multi-gen LRU: shuffle should_run_aging()
  mm: multi-gen LRU: per-node lru_gen_folio lists
  mm: multi-gen LRU: clarify scan_control flags
  mm: multi-gen LRU: simplify arch_has_hw_pte_young() check

 Documentation/mm/multigen_lru.rst |   8 +-
 include/linux/memcontrol.h        |  10 +
 include/linux/mm_inline.h         |  25 +-
 include/linux/mmzone.h            | 127 ++++-
 mm/memcontrol.c                   |  16 +
 mm/page_alloc.c                   |   1 +
 mm/vmscan.c                       | 765 ++++++++++++++++++++----------
 mm/workingset.c                   |   4 +-
 8 files changed, 687 insertions(+), 269 deletions(-)

-- 
2.39.0.rc0.267.gcb52ba06e7-goog



^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-12-20 21:49 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-01 22:39 [PATCH mm-unstable v1 0/8] mm: multi-gen LRU: memcg LRU Yu Zhao
2022-12-01 22:39 ` [PATCH mm-unstable v1 1/8] mm: multi-gen LRU: rename lru_gen_struct to lru_gen_folio Yu Zhao
2022-12-01 22:39 ` [PATCH mm-unstable v1 2/8] mm: multi-gen LRU: rename lrugen->lists[] to lrugen->folios[] Yu Zhao
2022-12-01 22:39 ` [PATCH mm-unstable v1 3/8] mm: multi-gen LRU: remove eviction fairness safeguard Yu Zhao
2022-12-11  3:59   ` Chen Wandun
2022-12-01 22:39 ` [PATCH mm-unstable v1 4/8] mm: multi-gen LRU: remove aging " Yu Zhao
2022-12-01 22:39 ` [PATCH mm-unstable v1 5/8] mm: multi-gen LRU: shuffle should_run_aging() Yu Zhao
2022-12-01 22:39 ` [PATCH mm-unstable v1 6/8] mm: multi-gen LRU: per-node lru_gen_folio lists Yu Zhao
2022-12-03  4:20   ` Hillf Danton
2022-12-01 22:39 ` [PATCH mm-unstable v1 7/8] mm: multi-gen LRU: clarify scan_control flags Yu Zhao
2022-12-02  4:17   ` Hillf Danton
2022-12-01 22:39 ` [PATCH mm-unstable v1 8/8] mm: multi-gen LRU: simplify arch_has_hw_pte_young() check Yu Zhao
2022-12-20 21:49 ` JavaScript / Ampere Altra benchmark with MGLRU Yu Zhao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox