From: Takero Funaki <flintglass@gmail.com>
To: flintglass@gmail.com, Johannes Weiner <hannes@cmpxchg.org>,
Yosry Ahmed <yosryahmed@google.com>,
Nhat Pham <nphamcs@gmail.com>,
Chengming Zhou <chengming.zhou@linux.dev>,
Jonathan Corbet <corbet@lwn.net>,
Andrew Morton <akpm@linux-foundation.org>,
Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Cc: linux-mm@kvack.org, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: [PATCH 1/3] mm: zswap: fix global shrinker memcg iteration
Date: Tue, 28 May 2024 04:34:02 +0000 [thread overview]
Message-ID: <20240528043404.39327-3-flintglass@gmail.com> (raw)
In-Reply-To: <20240528043404.39327-2-flintglass@gmail.com>
This patch fixes an issue where the zswap global shrinker stopped
iterating through the memcg tree.
The problem was that `shrink_worker()` would stop iterating when a memcg
was being offlined and restart from the tree root. Now, it properly
handles the offlining memcg and continues shrinking with the next memcg.
Fixes: a65b0e7607cc ("zswap: make shrinking memcg-aware")
Signed-off-by: Takero Funaki <flintglass@gmail.com>
---
mm/zswap.c | 76 ++++++++++++++++++++++++++++++++++++++++--------------
1 file changed, 56 insertions(+), 20 deletions(-)
diff --git a/mm/zswap.c b/mm/zswap.c
index a50e2986cd2f..0b1052cee36c 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -775,12 +775,27 @@ void zswap_folio_swapin(struct folio *folio)
}
}
+/*
+ * This function should be called when a memcg is being offlined.
+ *
+ * Since the global shrinker shrink_worker() may hold a reference
+ * of the memcg, we must check and release the reference in
+ * zswap_next_shrink.
+ *
+ * shrink_worker() must handle the case where this function releases
+ * the reference of memcg being shrunk.
+ */
void zswap_memcg_offline_cleanup(struct mem_cgroup *memcg)
{
/* lock out zswap shrinker walking memcg tree */
spin_lock(&zswap_shrink_lock);
- if (zswap_next_shrink == memcg)
- zswap_next_shrink = mem_cgroup_iter(NULL, zswap_next_shrink, NULL);
+
+ if (READ_ONCE(zswap_next_shrink) == memcg) {
+ /* put back reference and advance the cursor */
+ memcg = mem_cgroup_iter(NULL, memcg, NULL);
+ WRITE_ONCE(zswap_next_shrink, memcg);
+ }
+
spin_unlock(&zswap_shrink_lock);
}
@@ -1312,25 +1327,38 @@ static int shrink_memcg(struct mem_cgroup *memcg)
static void shrink_worker(struct work_struct *w)
{
- struct mem_cgroup *memcg;
+ struct mem_cgroup *memcg = NULL;
+ struct mem_cgroup *next_memcg;
int ret, failures = 0;
unsigned long thr;
/* Reclaim down to the accept threshold */
thr = zswap_accept_thr_pages();
- /* global reclaim will select cgroup in a round-robin fashion. */
+ /* global reclaim will select cgroup in a round-robin fashion.
+ *
+ * We save iteration cursor memcg into zswap_next_shrink,
+ * which can be modified by the offline memcg cleaner
+ * zswap_memcg_offline_cleanup().
+ */
do {
spin_lock(&zswap_shrink_lock);
- zswap_next_shrink = mem_cgroup_iter(NULL, zswap_next_shrink, NULL);
- memcg = zswap_next_shrink;
+ next_memcg = READ_ONCE(zswap_next_shrink);
+
+ if (memcg != next_memcg) {
+ /*
+ * Ours was released by offlining.
+ * Use the saved memcg reference.
+ */
+ memcg = next_memcg;
+ } else {
+iternext:
+ /* advance cursor */
+ memcg = mem_cgroup_iter(NULL, memcg, NULL);
+ WRITE_ONCE(zswap_next_shrink, memcg);
+ }
/*
- * We need to retry if we have gone through a full round trip, or if we
- * got an offline memcg (or else we risk undoing the effect of the
- * zswap memcg offlining cleanup callback). This is not catastrophic
- * per se, but it will keep the now offlined memcg hostage for a while.
- *
* Note that if we got an online memcg, we will keep the extra
* reference in case the original reference obtained by mem_cgroup_iter
* is dropped by the zswap memcg offlining callback, ensuring that the
@@ -1345,16 +1373,18 @@ static void shrink_worker(struct work_struct *w)
}
if (!mem_cgroup_tryget_online(memcg)) {
- /* drop the reference from mem_cgroup_iter() */
- mem_cgroup_iter_break(NULL, memcg);
- zswap_next_shrink = NULL;
- spin_unlock(&zswap_shrink_lock);
-
- if (++failures == MAX_RECLAIM_RETRIES)
- break;
-
- goto resched;
+ /*
+ * It is an offline memcg which we cannot shrink
+ * until its pages are reparented.
+ * Put back the memcg reference before cleanup
+ * function reads it from zswap_next_shrink.
+ */
+ goto iternext;
}
+ /*
+ * We got an extra memcg reference before unlocking.
+ * The cleaner cannot free it using zswap_next_shrink.
+ */
spin_unlock(&zswap_shrink_lock);
ret = shrink_memcg(memcg);
@@ -1368,6 +1398,12 @@ static void shrink_worker(struct work_struct *w)
resched:
cond_resched();
} while (zswap_total_pages() > thr);
+
+ /*
+ * We can still hold the original memcg reference.
+ * The reference is stored in zswap_next_shrink, and then reused
+ * by the next shrink_worker().
+ */
}
/*********************************
--
2.43.0
next prev parent reply other threads:[~2024-05-28 4:34 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-28 4:34 [PATCH 0/3] mm: zswap: global shrinker fix and proactive shrink Takero Funaki
2024-05-28 4:34 ` Takero Funaki [this message]
2024-05-28 15:09 ` [PATCH 1/3] mm: zswap: fix global shrinker memcg iteration Nhat Pham
2024-05-29 12:42 ` Takero Funaki
2024-05-28 4:34 ` [PATCH 2/3] mm: zswap: fix global shrinker error handling logic Takero Funaki
2024-05-28 15:11 ` Nhat Pham
2024-05-29 13:00 ` Takero Funaki
2024-05-28 4:34 ` [PATCH 3/3] mm: zswap: proactive shrinking before pool size limit is hit Takero Funaki
2024-05-28 16:01 ` Nhat Pham
2024-05-29 13:27 ` Takero Funaki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240528043404.39327-3-flintglass@gmail.com \
--to=flintglass@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=cerasuolodomenico@gmail.com \
--cc=chengming.zhou@linux.dev \
--cc=corbet@lwn.net \
--cc=hannes@cmpxchg.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=nphamcs@gmail.com \
--cc=yosryahmed@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox