From: Muchun Song <songmuchun@bytedance.com>
To: viro@zeniv.linux.org.uk, jack@suse.cz, amir73il@gmail.com,
ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org,
kafai@fb.com, songliubraving@fb.com, yhs@fb.com,
john.fastabend@gmail.com, kpsingh@kernel.org, mingo@redhat.com,
peterz@infradead.org, juri.lelli@redhat.com,
vincent.guittot@linaro.org, dietmar.eggemann@arm.com,
rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de,
bristot@redhat.com, hannes@cmpxchg.org, mhocko@kernel.org,
vdavydov.dev@gmail.com, akpm@linux-foundation.org,
shakeelb@google.com, guro@fb.com, songmuchun@bytedance.com,
alex.shi@linux.alibaba.com, alexander.h.duyck@linux.intel.com,
chris@chrisdown.name, richard.weiyang@gmail.com, vbabka@suse.cz,
mathieu.desnoyers@efficios.com, posk@google.com,
jannh@google.com, iamjoonsoo.kim@lge.com, daniel.vetter@ffwll.ch,
longman@redhat.com, walken@google.com,
christian.brauner@ubuntu.com, ebiederm@xmission.com,
keescook@chromium.org, krisman@collabora.com, esyr@redhat.com,
surenb@google.com, elver@google.com
Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
netdev@vger.kernel.org, bpf@vger.kernel.org,
cgroups@vger.kernel.org, linux-mm@kvack.org,
duanxiongchun@bytedance.com
Subject: [PATCH 0/5] Use obj_cgroup APIs to change kmem pages
Date: Mon, 1 Mar 2021 14:22:22 +0800 [thread overview]
Message-ID: <20210301062227.59292-1-songmuchun@bytedance.com> (raw)
Since Roman series "The new cgroup slab memory controller" applied. All
slab objects are changed via the new APIs of obj_cgroup. This new APIs
introduce a struct obj_cgroup instead of using struct mem_cgroup directly
to charge slab objects. It prevents long-living objects from pinning the
original memory cgroup in the memory. But there are still some corner
objects (e.g. allocations larger than order-1 page on SLUB) which are
not charged via the API of obj_cgroup. Those objects (include the pages
which are allocated from buddy allocator directly) are charged as kmem
pages which still hold a reference to the memory cgroup.
E.g. We know that the kernel stack is charged as kmem pages because the
size of the kernel stack can be greater than 2 pages (e.g. 16KB on x86_64
or arm64). If we create a thread (suppose the thread stack is charged to
memory cgroup A) and then move it from memory cgroup A to memory cgroup
B. Because the kernel stack of the thread hold a reference to the memory
cgroup A. The thread can pin the memory cgroup A in the memory even if
we remove the cgroup A. If we want to see this scenario by using the
following script. We can see that the system has added 500 dying cgroups.
#!/bin/bash
cat /proc/cgroups | grep memory
cd /sys/fs/cgroup/memory
echo 1 > memory.move_charge_at_immigrate
for i in range{1..500}
do
mkdir kmem_test
echo $$ > kmem_test/cgroup.procs
sleep 3600 &
echo $$ > cgroup.procs
echo `cat kmem_test/cgroup.procs` > cgroup.procs
rmdir kmem_test
done
cat /proc/cgroups | grep memory
This patchset aims to make those kmem pages drop the reference to memory
cgroup by using the APIs of obj_cgroup. Finally, we can see that the number
of the dying cgroups will not increase if we run the above test script.
Patch 1-3 are using obj_cgroup APIs to charge kmem pages. The remote
memory cgroup charing APIs is a mechanism to charge kernel memory to a
given memory cgroup. So I also make it use the APIs of obj_cgroup.
Patch 4-5 are doing this.
Muchun Song (5):
mm: memcontrol: introduce obj_cgroup_{un}charge_page
mm: memcontrol: make page_memcg{_rcu} only applicable for non-kmem
page
mm: memcontrol: reparent the kmem pages on cgroup removal
mm: memcontrol: move remote memcg charging APIs to CONFIG_MEMCG_KMEM
mm: memcontrol: use object cgroup for remote memory cgroup charging
fs/buffer.c | 10 +-
fs/notify/fanotify/fanotify.c | 6 +-
fs/notify/fanotify/fanotify_user.c | 2 +-
fs/notify/group.c | 3 +-
fs/notify/inotify/inotify_fsnotify.c | 8 +-
fs/notify/inotify/inotify_user.c | 2 +-
include/linux/bpf.h | 2 +-
include/linux/fsnotify_backend.h | 2 +-
include/linux/memcontrol.h | 109 +++++++++++---
include/linux/sched.h | 6 +-
include/linux/sched/mm.h | 30 ++--
kernel/bpf/syscall.c | 35 ++---
kernel/fork.c | 4 +-
mm/memcontrol.c | 276 ++++++++++++++++++++++-------------
mm/page_alloc.c | 4 +-
15 files changed, 324 insertions(+), 175 deletions(-)
--
2.11.0
next reply other threads:[~2021-03-01 6:24 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-03-01 6:22 Muchun Song [this message]
2021-03-01 6:22 ` [PATCH 1/5] mm: memcontrol: introduce obj_cgroup_{un}charge_page Muchun Song
2021-03-01 6:22 ` [PATCH 2/5] mm: memcontrol: make page_memcg{_rcu} only applicable for non-kmem page Muchun Song
2021-03-01 18:11 ` Shakeel Butt
2021-03-01 19:09 ` Johannes Weiner
2021-03-02 3:49 ` [External] " Muchun Song
2021-03-02 3:03 ` Muchun Song
2021-03-02 3:35 ` Shakeel Butt
2021-03-02 3:51 ` Muchun Song
2021-03-01 6:22 ` [PATCH 3/5] mm: memcontrol: reparent the kmem pages on cgroup removal Muchun Song
2021-03-01 6:22 ` [PATCH 4/5] mm: memcontrol: move remote memcg charging APIs to CONFIG_MEMCG_KMEM Muchun Song
2021-03-02 1:15 ` Roman Gushchin
2021-03-02 3:43 ` Shakeel Butt
2021-03-02 3:58 ` Roman Gushchin
2021-03-02 4:12 ` [External] " Muchun Song
2021-03-01 6:22 ` [PATCH 5/5] mm: memcontrol: use object cgroup for remote memory cgroup charging Muchun Song
2021-03-02 1:29 ` Roman Gushchin
2021-03-02 4:11 ` [External] " Muchun Song
2021-03-02 1:12 ` [PATCH 0/5] Use obj_cgroup APIs to change kmem pages Roman Gushchin
2021-03-02 2:50 ` [External] " Muchun Song
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210301062227.59292-1-songmuchun@bytedance.com \
--to=songmuchun@bytedance.com \
--cc=akpm@linux-foundation.org \
--cc=alex.shi@linux.alibaba.com \
--cc=alexander.h.duyck@linux.intel.com \
--cc=amir73il@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=bristot@redhat.com \
--cc=bsegall@google.com \
--cc=cgroups@vger.kernel.org \
--cc=chris@chrisdown.name \
--cc=christian.brauner@ubuntu.com \
--cc=daniel.vetter@ffwll.ch \
--cc=daniel@iogearbox.net \
--cc=dietmar.eggemann@arm.com \
--cc=duanxiongchun@bytedance.com \
--cc=ebiederm@xmission.com \
--cc=elver@google.com \
--cc=esyr@redhat.com \
--cc=guro@fb.com \
--cc=hannes@cmpxchg.org \
--cc=iamjoonsoo.kim@lge.com \
--cc=jack@suse.cz \
--cc=jannh@google.com \
--cc=john.fastabend@gmail.com \
--cc=juri.lelli@redhat.com \
--cc=kafai@fb.com \
--cc=keescook@chromium.org \
--cc=kpsingh@kernel.org \
--cc=krisman@collabora.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=longman@redhat.com \
--cc=mathieu.desnoyers@efficios.com \
--cc=mgorman@suse.de \
--cc=mhocko@kernel.org \
--cc=mingo@redhat.com \
--cc=netdev@vger.kernel.org \
--cc=peterz@infradead.org \
--cc=posk@google.com \
--cc=richard.weiyang@gmail.com \
--cc=rostedt@goodmis.org \
--cc=shakeelb@google.com \
--cc=songliubraving@fb.com \
--cc=surenb@google.com \
--cc=vbabka@suse.cz \
--cc=vdavydov.dev@gmail.com \
--cc=vincent.guittot@linaro.org \
--cc=viro@zeniv.linux.org.uk \
--cc=walken@google.com \
--cc=yhs@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox