From: Yu Zhao <yuzhao@google.com>
To: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, x86@kernel.org,
page-reclaim@google.com, holger@applied-asynchrony.com,
iam@valdikss.org.ru, Yu Zhao <yuzhao@google.com>,
Konstantin Kharlamov <Hi-Angel@yandex.ru>
Subject: [PATCH v5 10/10] mm: multigenerational lru: documentation
Date: Wed, 10 Nov 2021 21:15:10 -0700 [thread overview]
Message-ID: <20211111041510.402534-11-yuzhao@google.com> (raw)
In-Reply-To: <20211111041510.402534-1-yuzhao@google.com>
Add Documentation/vm/multigen_lru.rst.
Signed-off-by: Yu Zhao <yuzhao@google.com>
Tested-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>
---
Documentation/vm/index.rst | 1 +
Documentation/vm/multigen_lru.rst | 132 ++++++++++++++++++++++++++++++
2 files changed, 133 insertions(+)
create mode 100644 Documentation/vm/multigen_lru.rst
diff --git a/Documentation/vm/index.rst b/Documentation/vm/index.rst
index b51f0d8992f8..779772a025a0 100644
--- a/Documentation/vm/index.rst
+++ b/Documentation/vm/index.rst
@@ -17,6 +17,7 @@ various features of the Linux memory management
swap_numa
zswap
+ multigen_lru
Kernel developers MM documentation
==================================
diff --git a/Documentation/vm/multigen_lru.rst b/Documentation/vm/multigen_lru.rst
new file mode 100644
index 000000000000..7c064a378b85
--- /dev/null
+++ b/Documentation/vm/multigen_lru.rst
@@ -0,0 +1,132 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=====================
+Multigenerational LRU
+=====================
+
+Quick Start
+===========
+Build Configurations
+--------------------
+:Required: Set ``CONFIG_LRU_GEN=y``.
+
+:Optional: Set ``CONFIG_LRU_GEN_ENABLED=y`` to turn the feature on by
+ default.
+
+Runtime Configurations
+----------------------
+:Required: Write ``1`` to ``/sys/kernel/mm/lru_gen/enable`` if the
+ feature was not turned on by default.
+
+:Optional: Write ``N`` to ``/sys/kernel/mm/lru_gen/min_ttl_ms`` to
+ protect the working set of ``N`` milliseconds. The OOM killer is
+ invoked if this working set cannot be kept in memory.
+
+:Optional: Read ``/sys/kernel/debug/lru_gen`` to confirm the feature
+ is turned on. This file has the following output:
+
+::
+
+ memcg memcg_id memcg_path
+ node node_id
+ min_gen birth_time anon_size file_size
+ ...
+ max_gen birth_time anon_size file_size
+
+``min_gen`` is the oldest generation number and ``max_gen`` is the
+youngest generation number. ``birth_time`` is in milliseconds.
+``anon_size`` and ``file_size`` are in pages.
+
+Phones/Laptops/Workstations
+---------------------------
+No additional configurations required.
+
+Servers/Data Centers
+--------------------
+:To support more generations: Change ``CONFIG_NR_LRU_GENS`` to a
+ larger number.
+
+:To support more tiers: Change ``CONFIG_TIERS_PER_GEN`` to a larger
+ number.
+
+:To support full stats: Set ``CONFIG_LRU_GEN_STATS=y``.
+
+:Working set estimation: Write ``+ memcg_id node_id max_gen
+ [swappiness] [use_bloom_filter]`` to ``/sys/kernel/debug/lru_gen`` to
+ invoke the aging, which scans PTEs for accessed pages and then
+ creates the next generation ``max_gen+1``. A swap file and a non-zero
+ ``swappiness``, which overrides ``vm.swappiness``, are required to
+ scan PTEs mapping anon pages. Set ``use_bloom_filter`` to 0 to
+ override the default behavior which only scans PTE tables found
+ populated.
+
+:Proactive reclaim: Write ``- memcg_id node_id min_gen [swappiness]
+ [nr_to_reclaim]`` to ``/sys/kernel/debug/lru_gen`` to invoke the
+ eviction, which evicts generations less than or equal to ``min_gen``.
+ ``min_gen`` should be less than ``max_gen-1`` as ``max_gen`` and
+ ``max_gen-1`` are not fully aged and therefore cannot be evicted.
+ Use ``nr_to_reclaim`` to limit the number of pages to evict. Multiple
+ command lines are supported, so does concatenation with delimiters
+ ``,`` and ``;``.
+
+Framework
+=========
+For each ``lruvec``, evictable pages are divided into multiple
+generations. The youngest generation number is stored in
+``lrugen->max_seq`` for both anon and file types as they are aged on
+an equal footing. The oldest generation numbers are stored in
+``lrugen->min_seq[]`` separately for anon and file types as clean
+file pages can be evicted regardless of swap and writeback
+constraints. These three variables are monotonically increasing.
+Generation numbers are truncated into
+``order_base_2(CONFIG_NR_LRU_GENS+1)`` bits in order to fit into
+``page->flags``. The sliding window technique is used to prevent
+truncated generation numbers from overlapping. Each truncated
+generation number is an index to an array of per-type and per-zone
+lists ``lrugen->lists``.
+
+Each generation is divided into multiple tiers. Tiers represent
+different ranges of numbers of accesses from file descriptors only.
+Pages accessed ``N`` times via file descriptors belong to tier
+``order_base_2(N)``. Each generation contains at most
+``CONFIG_TIERS_PER_GEN`` tiers, and they require additional
+``CONFIG_TIERS_PER_GEN-2`` bits in ``page->flags``. In contrast to
+moving between generations which requires list operations, moving
+between tiers only involves operations on ``page->flags`` and
+therefore has a negligible cost. A feedback loop modeled after the PID
+controller monitors refaulted % across all tiers and decides when to
+protect pages from which tiers.
+
+The framework comprises two conceptually independent components: the
+aging and the eviction, which can be invoked separately from user
+space for the purpose of working set estimation and proactive reclaim.
+
+Aging
+-----
+The aging produces young generations. Given an ``lruvec``, the aging
+traverses ``lruvec_memcg()->mm_list`` and calls ``walk_page_range()``
+to scan PTEs for accessed pages (a ``mm_struct`` list is maintained
+for each ``memcg``). Upon finding one, the aging updates its
+generation number to ``max_seq`` (modulo ``CONFIG_NR_LRU_GENS``).
+After each round of traversal, the aging increments ``max_seq``. The
+aging is due when ``min_seq[]`` reaches ``max_seq-1``.
+
+Eviction
+--------
+The eviction consumes old generations. Given an ``lruvec``, the
+eviction scans pages on the per-zone lists indexed by anon and file
+``min_seq[]`` (modulo ``CONFIG_NR_LRU_GENS``). It first tries to
+select a type based on the values of ``min_seq[]``. If they are
+equal, it selects the type that has a lower refaulted %. The eviction
+sorts a page according to its updated generation number if the aging
+has found this page accessed. It also moves a page to the next
+generation if this page is from an upper tier that has a higher
+refaulted % than the base tier. The eviction increments ``min_seq[]``
+of a selected type when it finds all the per-zone lists indexed by
+``min_seq[]`` of this selected type are empty.
+
+To-do List
+==========
+KVM Optimization
+----------------
+Support shadow page table walk.
--
2.34.0.rc0.344.g81b53c2807-goog
next prev parent reply other threads:[~2021-11-11 4:15 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-11-11 4:15 [PATCH v5 00/10] Multigenerational LRU Framework Yu Zhao
2021-11-11 4:15 ` [PATCH v5 01/10] mm: x86, arm64: add arch_has_hw_pte_young() Yu Zhao
2021-11-11 8:59 ` Will Deacon
2021-11-11 11:11 ` Yu Zhao
2021-11-11 4:15 ` [PATCH v5 02/10] mm: x86: add CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG Yu Zhao
2021-11-11 4:15 ` [PATCH v5 03/10] mm/vmscan.c: refactor shrink_node() Yu Zhao
2021-11-11 4:15 ` [PATCH v5 04/10] mm: multigenerational lru: groundwork Yu Zhao
2021-11-11 4:15 ` [PATCH v5 05/10] mm: multigenerational lru: mm_struct list Yu Zhao
2021-11-11 4:15 ` [PATCH v5 06/10] mm: multigenerational lru: aging Yu Zhao
2021-11-11 4:15 ` [PATCH v5 07/10] mm: multigenerational lru: eviction Yu Zhao
2021-11-11 4:15 ` [PATCH v5 08/10] mm: multigenerational lru: user interface Yu Zhao
2021-11-11 4:15 ` [PATCH v5 09/10] mm: multigenerational lru: Kconfig Yu Zhao
2021-11-11 4:15 ` Yu Zhao [this message]
2021-11-22 5:32 ` [PATCH v5 00/10] Multigenerational LRU Framework bot
2021-12-02 6:28 ` bot
2021-12-09 7:24 ` bot
2021-12-18 7:10 ` bot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20211111041510.402534-11-yuzhao@google.com \
--to=yuzhao@google.com \
--cc=Hi-Angel@yandex.ru \
--cc=holger@applied-asynchrony.com \
--cc=iam@valdikss.org.ru \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=page-reclaim@google.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox