From: Roman Gushchin <guroan@gmail.com>
To: linux-mm@kvack.org, kernel-team@fb.com
Cc: linux-kernel@vger.kernel.org, Tejun Heo <tj@kernel.org>,
Rik van Riel <riel@surriel.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Michal Hocko <mhocko@kernel.org>, Roman Gushchin <guro@fb.com>
Subject: [PATCH v3 0/6] mm: reduce the memory footprint of dying memory cgroups
Date: Wed, 13 Mar 2019 11:39:47 -0700 [thread overview]
Message-ID: <20190313183953.17854-1-guro@fb.com> (raw)
A cgroup can remain in the dying state for a long time, being pinned in the
memory by any kernel object. It can be pinned by a page, shared with other
cgroup (e.g. mlocked by a process in the other cgroup). It can be pinned
by a vfs cache object, etc.
Mostly because of percpu data, the size of a memcg structure in the kernel
memory is quite large. Depending on the machine size and the kernel config,
it can easily reach hundreds of kilobytes per cgroup.
Depending on the memory pressure and the reclaim approach (which is a separate
topic), it looks like several hundreds (if not single thousands) of dying
cgroups is a typical number. On a moderately sized machine the overall memory
footprint is measured in hundreds of megabytes.
So if we can't completely get rid of dying cgroups, let's make them smaller.
This patchset aims to reduce the size of a dying memory cgroup by the premature
release of percpu data during the cgroup removal, and use of atomic counterparts
instead. Currently it covers per-memcg vmstat_percpu, per-memcg per-node
lruvec_stat_cpu. The same approach can be further applied to other percpu data.
Results on my test machine (32 CPUs, singe node):
With the patchset: Originally:
nr_dying_descendants 0
Slab: 66640 kB Slab: 67644 kB
Percpu: 6912 kB Percpu: 6912 kB
nr_dying_descendants 1000
Slab: 85912 kB Slab: 84704 kB
Percpu: 26880 kB Percpu: 64128 kB
So one dying cgroup went from 75 kB to 39 kB, which is almost twice smaller.
The difference will be even bigger on a bigger machine
(especially, with NUMA).
To test the patchset, I used the following script:
CG=/sys/fs/cgroup/percpu_test/
mkdir ${CG}
echo "+memory" > ${CG}/cgroup.subtree_control
cat ${CG}/cgroup.stat | grep nr_dying_descendants
cat /proc/meminfo | grep -e Percpu -e Slab
for i in `seq 1 1000`; do
mkdir ${CG}/${i}
echo $$ > ${CG}/${i}/cgroup.procs
dd if=/dev/urandom of=/tmp/test-${i} count=1 2> /dev/null
echo $$ > /sys/fs/cgroup/cgroup.procs
rmdir ${CG}/${i}
done
cat /sys/fs/cgroup/cgroup.stat | grep nr_dying_descendants
cat /proc/meminfo | grep -e Percpu -e Slab
rmdir ${CG}
v3:
- replaced get_cpu_mask() with cpumask_of() (by Johannes)
v2:
- several renamings suggested by Johannes Weiner
- added a patch, which merges cpu offlining and percpu flush code
Roman Gushchin (6):
mm: prepare to premature release of memcg->vmstats_percpu
mm: prepare to premature release of per-node lruvec_stat_cpu
mm: release memcg percpu data prematurely
mm: release per-node memcg percpu data prematurely
mm: flush memcg percpu stats and events before releasing
mm: refactor memcg_hotplug_cpu_dead() to use
memcg_flush_offline_percpu()
include/linux/memcontrol.h | 66 ++++++++++----
mm/memcontrol.c | 179 ++++++++++++++++++++++++++++---------
2 files changed, 186 insertions(+), 59 deletions(-)
--
2.20.1
next reply other threads:[~2019-03-13 18:40 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-03-13 18:39 Roman Gushchin [this message]
2019-03-13 18:39 ` [PATCH v3 1/6] mm: prepare to premature release of memcg->vmstats_percpu Roman Gushchin
2019-03-13 18:39 ` [PATCH v3 2/6] mm: prepare to premature release of per-node lruvec_stat_cpu Roman Gushchin
2019-03-13 18:39 ` [PATCH v3 3/6] mm: release memcg percpu data prematurely Roman Gushchin
2019-03-13 18:39 ` [PATCH v3 4/6] mm: release per-node " Roman Gushchin
2019-03-13 18:39 ` [PATCH v3 5/6] mm: flush memcg percpu stats and events before releasing Roman Gushchin
2019-03-13 18:39 ` [PATCH v3 6/6] mm: refactor memcg_hotplug_cpu_dead() to use memcg_flush_offline_percpu() Roman Gushchin
2019-03-13 19:48 ` Johannes Weiner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190313183953.17854-1-guro@fb.com \
--to=guroan@gmail.com \
--cc=guro@fb.com \
--cc=hannes@cmpxchg.org \
--cc=kernel-team@fb.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=riel@surriel.com \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox