From: Roman Gushchin <guro@fb.com>
To: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org, kernel-team@fb.com,
Rik van Riel <riel@surriel.com>, Roman Gushchin <guro@fb.com>,
Josef Bacik <jbacik@fb.com>, Johannes Weiner <hannes@cmpxchg.org>,
Andrew Morton <akpm@linux-foundation.org>
Subject: [PATCH v2] mm: slowly shrink slabs with a relatively small number of objects
Date: Tue, 4 Sep 2018 15:47:07 -0700 [thread overview]
Message-ID: <20180904224707.10356-1-guro@fb.com> (raw)
Commit 9092c71bb724 ("mm: use sc->priority for slab shrink targets")
changed the way how the target slab pressure is calculated and
made it priority-based:
delta = freeable >> priority;
delta *= 4;
do_div(delta, shrinker->seeks);
The problem is that on a default priority (which is 12) no pressure
is applied at all, if the number of potentially reclaimable objects
is less than 4096 (1<<12).
This causes the last objects on slab caches of no longer used cgroups
to never get reclaimed, resulting in dead cgroups staying around forever.
Slab LRU lists are reparented on memcg offlining, but corresponding
objects are still holding a reference to the dying cgroup.
If we don't scan them at all, the dying cgroup can't go away.
Most likely, the parent cgroup hasn't any directly associated objects,
only remaining objects from dying children cgroups. So it can easily
hold a reference to hundreds of dying cgroups.
If there are no big spikes in memory pressure, and new memory cgroups
are created and destroyed periodically, this causes the number of
dying cgroups grow steadily, causing a slow-ish and hard-to-detect
memory "leak". It's not a real leak, as the memory can be eventually
reclaimed, but it could not happen in a real life at all. I've seen
hosts with a steadily climbing number of dying cgroups, which doesn't
show any signs of a decline in months, despite the host is loaded
with a production workload.
It is an obvious waste of memory, and to prevent it, let's apply
a minimal pressure even on small shrinker lists. E.g. if there are
freeable objects, let's scan at least min(freeable, scan_batch)
objects.
This fix significantly improves a chance of a dying cgroup to be
reclaimed, and together with some previous patches stops the steady
growth of the dying cgroups number on some of our hosts.
Signed-off-by: Roman Gushchin <guro@fb.com>
Acked-by: Rik van Riel <riel@surriel.com>
Cc: Josef Bacik <jbacik@fb.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
---
mm/vmscan.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index fa2c150ab7b9..8544f4c5cd4f 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -476,6 +476,17 @@ static unsigned long do_shrink_slab(struct shrink_control *shrinkctl,
delta = freeable >> priority;
delta *= 4;
do_div(delta, shrinker->seeks);
+
+ /*
+ * Make sure we apply some minimal pressure even on
+ * small cgroups. This is necessary because some of
+ * belonging objects can hold a reference to a dying
+ * child cgroup. If we don't scan them, the dying
+ * cgroup can't go away unless the memory pressure
+ * (and the scanning priority) raise significantly.
+ */
+ delta = max(delta, min(freeable, batch_size));
+
total_scan += delta;
if (total_scan < 0) {
pr_err("shrink_slab: %pF negative objects to delete nr=%ld\n",
--
2.17.1
next reply other threads:[~2018-09-04 22:47 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-09-04 22:47 Roman Gushchin [this message]
2018-09-05 20:51 ` Andrew Morton
2018-09-05 21:22 ` Roman Gushchin
2018-09-05 21:35 ` Shakeel Butt
2018-09-05 21:47 ` Roman Gushchin
2018-09-06 7:42 ` kbuild test robot
2018-09-06 15:58 ` Roman Gushchin
2018-09-06 22:21 ` kbuild test robot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180904224707.10356-1-guro@fb.com \
--to=guro@fb.com \
--cc=akpm@linux-foundation.org \
--cc=hannes@cmpxchg.org \
--cc=jbacik@fb.com \
--cc=kernel-team@fb.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=riel@surriel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox