From: Yu Zhao <yuzhao@google.com>
To: Andrew Morton <akpm@linux-foundation.org>,
Johannes Weiner <hannes@cmpxchg.org>,
Mel Gorman <mgorman@suse.de>, Michal Hocko <mhocko@kernel.org>
Cc: "Andi Kleen" <ak@linux.intel.com>,
"Aneesh Kumar" <aneesh.kumar@linux.ibm.com>,
"Barry Song" <21cnbao@gmail.com>,
"Catalin Marinas" <catalin.marinas@arm.com>,
"Dave Hansen" <dave.hansen@linux.intel.com>,
"Hillf Danton" <hdanton@sina.com>, "Jens Axboe" <axboe@kernel.dk>,
"Jesse Barnes" <jsbarnes@google.com>,
"Jonathan Corbet" <corbet@lwn.net>,
"Linus Torvalds" <torvalds@linux-foundation.org>,
"Matthew Wilcox" <willy@infradead.org>,
"Michael Larabel" <Michael@michaellarabel.com>,
"Mike Rapoport" <rppt@kernel.org>,
"Rik van Riel" <riel@surriel.com>,
"Vlastimil Babka" <vbabka@suse.cz>,
"Will Deacon" <will@kernel.org>,
"Ying Huang" <ying.huang@intel.com>,
linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
page-reclaim@google.com, x86@kernel.org,
"Yu Zhao" <yuzhao@google.com>,
"Brian Geffon" <bgeffon@google.com>,
"Jan Alexander Steffens" <heftig@archlinux.org>,
"Oleksandr Natalenko" <oleksandr@natalenko.name>,
"Steven Barrett" <steven@liquorix.net>,
"Suleiman Souhlal" <suleiman@google.com>,
"Daniel Byrne" <djbyrne@mtu.edu>,
"Donald Carr" <d@chaos-reins.com>,
"Holger Hoffstätte" <holger@applied-asynchrony.com>,
"Konstantin Kharlamov" <Hi-Angel@yandex.ru>,
"Shuang Zhai" <szhai2@cs.rochester.edu>,
"Sofia Trinh" <sofia.trinh@edi.works>
Subject: [PATCH v7 08/12] mm: multigenerational LRU: optimize multiple memcgs
Date: Tue, 8 Feb 2022 01:18:58 -0700 [thread overview]
Message-ID: <20220208081902.3550911-9-yuzhao@google.com> (raw)
In-Reply-To: <20220208081902.3550911-1-yuzhao@google.com>
When multiple memcgs are available, it's possible to improve the
overall performance under global memory pressure by making better
choices based on generations and tiers. This patch adds a rudimentary
optimization to select memcgs that can drop single-use unmapped clean
pages first, and thus it reduces the chance of going into the aging
path or swapping, which can be costly. Its goal is to improve the
overall performance when there are mixed types of workloads, e.g.,
heavy anon workload in one memcg and heavy buffered I/O workload in
the other.
Though this optimization can be applied to both kswapd and direct
reclaim, it's only added to kswapd to keep the patchset manageable.
Later improvements will cover the direct reclaim path.
Server benchmark results:
Mixed workloads:
fio (buffered I/O): -[28, 30]%
IOPS BW
patch1-7: 3117k 11.9GiB/s
patch1-8: 2217k 8661MiB/s
memcached (anon): +[247, 251]%
Ops/sec KB/sec
patch1-7: 563772.35 21900.01
patch1-8: 1968343.76 76461.24
Mixed workloads:
fio (buffered I/O): -[4, 6]%
IOPS BW
5.17-rc2: 2338k 9133MiB/s
patch1-8: 2217k 8661MiB/s
memcached (anon): +[524, 530]%
Ops/sec KB/sec
5.17-rc2: 313821.65 12190.55
patch1-8: 1968343.76 76461.24
Configurations:
(changes since patch 5)
cat combined.sh
modprobe brd rd_nr=2 rd_size=56623104
swapoff -a
mkswap /dev/ram0
swapon /dev/ram0
mkfs.ext4 /dev/ram1
mount -t ext4 /dev/ram1 /mnt
memtier_benchmark -S /var/run/memcached/memcached.sock \
-P memcache_binary -n allkeys --key-minimum=1 \
--key-maximum=50000000 --key-pattern=P:P -c 1 -t 36 \
--ratio 1:0 --pipeline 8 -d 2000
fio -name=mglru --numjobs=36 --directory=/mnt --size=1408m \
--buffered=1 --ioengine=io_uring --iodepth=128 \
--iodepth_batch_submit=32 --iodepth_batch_complete=32 \
--rw=randread --random_distribution=random --norandommap \
--time_based --ramp_time=10m --runtime=90m --group_reporting &
pid=$!
sleep 200
memtier_benchmark -S /var/run/memcached/memcached.sock \
-P memcache_binary -n allkeys --key-minimum=1 \
--key-maximum=50000000 --key-pattern=R:R -c 1 -t 36 \
--ratio 0:1 --pipeline 8 --randomize --distinct-client-seed
kill -INT $pid
wait
Client benchmark results:
no change (CONFIG_MEMCG=n)
Signed-off-by: Yu Zhao <yuzhao@google.com>
Acked-by: Brian Geffon <bgeffon@google.com>
Acked-by: Jan Alexander Steffens (heftig) <heftig@archlinux.org>
Acked-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Acked-by: Steven Barrett <steven@liquorix.net>
Acked-by: Suleiman Souhlal <suleiman@google.com>
Tested-by: Daniel Byrne <djbyrne@mtu.edu>
Tested-by: Donald Carr <d@chaos-reins.com>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Tested-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>
Tested-by: Shuang Zhai <szhai2@cs.rochester.edu>
Tested-by: Sofia Trinh <sofia.trinh@edi.works>
---
mm/vmscan.c | 45 +++++++++++++++++++++++++++++++++++++++++----
1 file changed, 41 insertions(+), 4 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 5ab6cd332fcc..fc09b6c10624 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -127,6 +127,13 @@ struct scan_control {
/* Always discard instead of demoting to lower tier memory */
unsigned int no_demotion:1;
+#ifdef CONFIG_LRU_GEN
+ /* help make better choices when multiple memcgs are available */
+ unsigned int memcgs_need_aging:1;
+ unsigned int memcgs_need_swapping:1;
+ unsigned int memcgs_avoid_swapping:1;
+#endif
+
/* Allocation order */
s8 order;
@@ -4343,6 +4350,22 @@ static void lru_gen_age_node(struct pglist_data *pgdat, struct scan_control *sc)
VM_BUG_ON(!current_is_kswapd());
+ /*
+ * To reduce the chance of going into the aging path or swapping, which
+ * can be costly, optimistically skip them unless their corresponding
+ * flags were cleared in the eviction path. This improves the overall
+ * performance when multiple memcgs are available.
+ */
+ if (!sc->memcgs_need_aging) {
+ sc->memcgs_need_aging = true;
+ sc->memcgs_avoid_swapping = !sc->memcgs_need_swapping;
+ sc->memcgs_need_swapping = true;
+ return;
+ }
+
+ sc->memcgs_need_swapping = true;
+ sc->memcgs_avoid_swapping = true;
+
current->reclaim_state->mm_walk = &pgdat->mm_walk;
memcg = mem_cgroup_iter(NULL, NULL, NULL);
@@ -4745,7 +4768,8 @@ static int isolate_folios(struct lruvec *lruvec, struct scan_control *sc, int sw
return scanned;
}
-static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swappiness)
+static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swappiness,
+ bool *swapped)
{
int type;
int scanned;
@@ -4810,6 +4834,9 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap
sc->nr_reclaimed += reclaimed;
+ if (!type && swapped)
+ *swapped = true;
+
return scanned;
}
@@ -4838,8 +4865,10 @@ static long get_nr_to_scan(struct lruvec *lruvec, struct scan_control *sc, bool
if (!nr_to_scan)
return 0;
- if (!need_aging)
+ if (!need_aging) {
+ sc->memcgs_need_aging = false;
return nr_to_scan;
+ }
/* leave the work to lru_gen_age_node() */
if (current_is_kswapd())
@@ -4861,6 +4890,8 @@ static void lru_gen_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc
{
struct blk_plug plug;
long scanned = 0;
+ bool swapped = false;
+ unsigned long reclaimed = sc->nr_reclaimed;
struct mem_cgroup *memcg = lruvec_memcg(lruvec);
struct pglist_data *pgdat = lruvec_pgdat(lruvec);
@@ -4887,13 +4918,19 @@ static void lru_gen_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc
if (!nr_to_scan)
break;
- delta = evict_folios(lruvec, sc, swappiness);
+ delta = evict_folios(lruvec, sc, swappiness, &swapped);
if (!delta)
break;
+ if (sc->memcgs_avoid_swapping && swappiness < 200 && swapped)
+ break;
+
scanned += delta;
- if (scanned >= nr_to_scan)
+ if (scanned >= nr_to_scan) {
+ if (!swapped && sc->nr_reclaimed - reclaimed >= MIN_LRU_BATCH)
+ sc->memcgs_need_swapping = false;
break;
+ }
cond_resched();
}
--
2.35.0.263.gb82422642f-goog
next prev parent reply other threads:[~2022-02-08 8:19 UTC|newest]
Thread overview: 74+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-02-08 8:18 [PATCH v7 00/12] Multigenerational LRU Framework Yu Zhao
2022-02-08 8:18 ` [PATCH v7 01/12] mm: x86, arm64: add arch_has_hw_pte_young() Yu Zhao
2022-02-08 8:24 ` Yu Zhao
2022-02-08 10:33 ` Will Deacon
2022-02-08 8:18 ` [PATCH v7 02/12] mm: x86: add CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG Yu Zhao
2022-02-08 8:27 ` Yu Zhao
2022-02-08 8:18 ` [PATCH v7 03/12] mm/vmscan.c: refactor shrink_node() Yu Zhao
2022-02-08 8:18 ` [PATCH v7 04/12] mm: multigenerational LRU: groundwork Yu Zhao
2022-02-08 8:28 ` Yu Zhao
2022-02-10 20:41 ` Johannes Weiner
2022-02-15 9:43 ` Yu Zhao
2022-02-15 21:53 ` Johannes Weiner
2022-02-21 8:14 ` Yu Zhao
2022-02-23 21:18 ` Yu Zhao
2022-02-25 16:34 ` Minchan Kim
2022-03-03 15:29 ` Johannes Weiner
2022-03-03 19:26 ` Yu Zhao
2022-03-03 21:43 ` Johannes Weiner
2022-03-11 10:16 ` Barry Song
2022-03-11 23:45 ` Yu Zhao
2022-03-12 10:37 ` Barry Song
2022-03-12 21:11 ` Yu Zhao
2022-03-13 4:57 ` Barry Song
2022-03-14 11:11 ` Barry Song
2022-03-14 16:45 ` Yu Zhao
2022-03-14 23:38 ` Barry Song
[not found] ` <CAOUHufa9eY44QadfGTzsxa2=hEvqwahXd7Canck5Gt-N6c4UKA@mail.gmail.com>
[not found] ` <CAGsJ_4zvj5rmz7DkW-kJx+jmUT9G8muLJ9De--NZma9ey0Oavw@mail.gmail.com>
2022-03-15 10:29 ` Barry Song
2022-03-16 2:46 ` Yu Zhao
2022-03-16 4:37 ` Barry Song
2022-03-16 5:44 ` Yu Zhao
2022-03-16 6:06 ` Barry Song
2022-03-16 21:37 ` Yu Zhao
2022-02-10 21:37 ` Matthew Wilcox
2022-02-13 21:16 ` Yu Zhao
2022-02-08 8:18 ` [PATCH v7 05/12] mm: multigenerational LRU: minimal implementation Yu Zhao
2022-02-08 8:33 ` Yu Zhao
2022-02-08 16:50 ` Johannes Weiner
2022-02-10 2:53 ` Yu Zhao
2022-02-13 10:04 ` Hillf Danton
2022-02-17 0:13 ` Yu Zhao
2022-02-23 8:27 ` Huang, Ying
2022-02-23 9:36 ` Yu Zhao
2022-02-24 0:59 ` Huang, Ying
2022-02-24 1:34 ` Yu Zhao
2022-02-24 3:31 ` Huang, Ying
2022-02-24 4:09 ` Yu Zhao
2022-02-24 5:27 ` Huang, Ying
2022-02-24 5:35 ` Yu Zhao
2022-02-08 8:18 ` [PATCH v7 06/12] mm: multigenerational LRU: exploit locality in rmap Yu Zhao
2022-02-08 8:40 ` Yu Zhao
2022-02-08 8:18 ` [PATCH v7 07/12] mm: multigenerational LRU: support page table walks Yu Zhao
2022-02-08 8:39 ` Yu Zhao
2022-02-08 8:18 ` Yu Zhao [this message]
2022-02-08 8:18 ` [PATCH v7 09/12] mm: multigenerational LRU: runtime switch Yu Zhao
2022-02-08 8:42 ` Yu Zhao
2022-02-08 8:19 ` [PATCH v7 10/12] mm: multigenerational LRU: thrashing prevention Yu Zhao
2022-02-08 8:43 ` Yu Zhao
2022-02-08 8:19 ` [PATCH v7 11/12] mm: multigenerational LRU: debugfs interface Yu Zhao
2022-02-18 18:56 ` [page-reclaim] " David Rientjes
2022-02-08 8:19 ` [PATCH v7 12/12] mm: multigenerational LRU: documentation Yu Zhao
2022-02-08 8:44 ` Yu Zhao
2022-02-14 10:28 ` Mike Rapoport
2022-02-16 3:22 ` Yu Zhao
2022-02-21 9:01 ` Mike Rapoport
2022-02-22 1:47 ` Yu Zhao
2022-02-23 10:58 ` Mike Rapoport
2022-02-23 21:20 ` Yu Zhao
2022-02-08 10:11 ` [PATCH v7 00/12] Multigenerational LRU Framework Oleksandr Natalenko
2022-02-08 11:14 ` Michal Hocko
2022-02-08 11:23 ` Oleksandr Natalenko
2022-02-11 20:12 ` Alexey Avramov
2022-02-12 21:01 ` Yu Zhao
2022-03-03 6:06 ` Vaibhav Jain
2022-03-03 6:47 ` Yu Zhao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220208081902.3550911-9-yuzhao@google.com \
--to=yuzhao@google.com \
--cc=21cnbao@gmail.com \
--cc=Hi-Angel@yandex.ru \
--cc=Michael@michaellarabel.com \
--cc=ak@linux.intel.com \
--cc=akpm@linux-foundation.org \
--cc=aneesh.kumar@linux.ibm.com \
--cc=axboe@kernel.dk \
--cc=bgeffon@google.com \
--cc=catalin.marinas@arm.com \
--cc=corbet@lwn.net \
--cc=d@chaos-reins.com \
--cc=dave.hansen@linux.intel.com \
--cc=djbyrne@mtu.edu \
--cc=hannes@cmpxchg.org \
--cc=hdanton@sina.com \
--cc=heftig@archlinux.org \
--cc=holger@applied-asynchrony.com \
--cc=jsbarnes@google.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=mhocko@kernel.org \
--cc=oleksandr@natalenko.name \
--cc=page-reclaim@google.com \
--cc=riel@surriel.com \
--cc=rppt@kernel.org \
--cc=sofia.trinh@edi.works \
--cc=steven@liquorix.net \
--cc=suleiman@google.com \
--cc=szhai2@cs.rochester.edu \
--cc=torvalds@linux-foundation.org \
--cc=vbabka@suse.cz \
--cc=will@kernel.org \
--cc=willy@infradead.org \
--cc=x86@kernel.org \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox