linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Roman Gushchin <guroan@gmail.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	kernel-team@fb.com, Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@kernel.org>, Rik van Riel <riel@surriel.com>,
	david@fromorbit.com, Christoph Lameter <cl@linux.com>,
	Pekka Enberg <penberg@kernel.org>,
	Vladimir Davydov <vdavydov.dev@gmail.com>,
	cgroups@vger.kernel.org, Roman Gushchin <guro@fb.com>
Subject: [PATCH 5/5] mm: reparent slab memory on cgroup removal
Date: Wed, 17 Apr 2019 14:54:34 -0700	[thread overview]
Message-ID: <20190417215434.25897-6-guro@fb.com> (raw)
In-Reply-To: <20190417215434.25897-1-guro@fb.com>

Let's reparent memcg slab memory on memcg offlining. This allows us
to release the memory cgroup without waiting for the last outstanding
kernel object (e.g. dentry used by another application).

So instead of reparenting all accounted slab pages, let's do reparent
a relatively small amount of kmem_caches. Reparenting is performed as
the last part of the deactivation process, so it's guaranteed that all
kmem_caches are not active at this moment.

Since the parent cgroup is already charged, everything we need to do
is to move the kmem_cache to the parent's kmem_caches list,
swap the memcg pointer, bump parent's css refcounter and drop
the cgroup's refcounter. Quite simple.

We can't race with the slab allocation path, and if we race with
deallocation path, it's not a big deal: parent's charge and slab stats
are always correct*, and we don't care anymore about the child usage
and stats. The child cgroup is already offline, so we don't use or
show it anywhere.

* please, look at the comment in kmemcg_cache_deactivate_after_rcu()
  for some additional details

Signed-off-by: Roman Gushchin <guro@fb.com>
---
 mm/memcontrol.c  |  4 +++-
 mm/slab.h        |  4 +++-
 mm/slab_common.c | 28 ++++++++++++++++++++++++++++
 3 files changed, 34 insertions(+), 2 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 87c06e342e05..2f61d13df0c4 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3239,7 +3239,6 @@ static void memcg_free_kmem(struct mem_cgroup *memcg)
 	if (memcg->kmem_state == KMEM_ALLOCATED) {
 		WARN_ON(!list_empty(&memcg->kmem_caches));
 		static_branch_dec(&memcg_kmem_enabled_key);
-		WARN_ON(page_counter_read(&memcg->kmem));
 	}
 }
 #else
@@ -4651,6 +4650,9 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
 
 	/* The following stuff does not apply to the root */
 	if (!parent) {
+#ifdef CONFIG_MEMCG_KMEM
+		INIT_LIST_HEAD(&memcg->kmem_caches);
+#endif
 		root_mem_cgroup = memcg;
 		return &memcg->css;
 	}
diff --git a/mm/slab.h b/mm/slab.h
index 1f49945f5c1d..be4f04ef65f9 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -329,10 +329,12 @@ static __always_inline void memcg_uncharge_slab(struct page *page, int order,
 		return;
 	}
 
-	memcg = s->memcg_params.memcg;
+	rcu_read_lock();
+	memcg = READ_ONCE(s->memcg_params.memcg);
 	lruvec = mem_cgroup_lruvec(page_pgdat(page), memcg);
 	mod_lruvec_state(lruvec, idx, -(1 << order));
 	memcg_kmem_uncharge_memcg(page, order, memcg);
+	rcu_read_unlock();
 
 	kmemcg_cache_put_many(s, 1 << order);
 }
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 3fdd02979a1c..fc2e86de402f 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -745,7 +745,35 @@ void kmemcg_queue_cache_shutdown(struct kmem_cache *s)
 
 static void kmemcg_cache_deactivate_after_rcu(struct kmem_cache *s)
 {
+	struct mem_cgroup *memcg, *parent;
+
 	__kmemcg_cache_deactivate_after_rcu(s);
+
+	memcg = s->memcg_params.memcg;
+	parent = parent_mem_cgroup(memcg);
+	if (!parent)
+		parent = root_mem_cgroup;
+
+	if (memcg == parent)
+		return;
+
+	/*
+	 * Let's reparent the kmem_cache. It's already deactivated, so we
+	 * can't race with memcg_charge_slab(). We still can race with
+	 * memcg_uncharge_slab(), but it's not a problem. The parent cgroup
+	 * is already charged, so it's ok to uncharge either the parent cgroup
+	 * directly, either recursively.
+	 * The same is true for recursive vmstats. Local vmstats are not use
+	 * anywhere, except count_shadow_nodes(). But reparenting will not
+	 * cahnge anything for count_shadow_nodes(): on memcg removal
+	 * shrinker lists are reparented, so it always returns SHRINK_EMPTY
+	 * for non-leaf dead memcgs. For the parent memcgs local slab stats
+	 * are always 0 now, so reparenting will not change anything.
+	 */
+	list_move(&s->memcg_params.kmem_caches_node, &parent->kmem_caches);
+	s->memcg_params.memcg = parent;
+	css_get(&parent->css);
+	css_put(&memcg->css);
 }
 
 static void kmemcg_cache_deactivate(struct kmem_cache *s)
-- 
2.20.1


  parent reply	other threads:[~2019-04-17 21:55 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-17 21:54 [PATCH 0/5] " Roman Gushchin
2019-04-17 21:54 ` [PATCH 1/5] mm: postpone kmem_cache memcg pointer initialization to memcg_link_cache() Roman Gushchin
2019-04-17 21:54 ` [PATCH 2/5] mm: generalize postponed non-root kmem_cache deactivation Roman Gushchin
2019-04-17 21:54 ` [PATCH 3/5] mm: introduce __memcg_kmem_uncharge_memcg() Roman Gushchin
2019-04-17 21:54 ` [PATCH 4/5] mm: rework non-root kmem_cache lifecycle management Roman Gushchin
2019-04-17 23:41   ` Shakeel Butt
2019-04-18  0:38     ` Roman Gushchin
2019-04-18  1:55       ` Shakeel Butt
2019-04-18  3:07         ` Roman Gushchin
2019-04-18 14:05           ` Shakeel Butt
2019-04-18 18:14             ` Roman Gushchin
2019-04-18 13:34   ` Christopher Lameter
2019-04-18 18:04     ` Roman Gushchin
2019-04-18 13:38   ` Christopher Lameter
2019-04-18 18:05     ` Roman Gushchin
2019-04-17 21:54 ` Roman Gushchin [this message]
2019-04-18  8:15 ` [PATCH 0/5] mm: reparent slab memory on cgroup removal Vladimir Davydov
2019-04-18 18:27   ` Roman Gushchin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190417215434.25897-6-guro@fb.com \
    --to=guroan@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=cl@linux.com \
    --cc=david@fromorbit.com \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=kernel-team@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=penberg@kernel.org \
    --cc=riel@surriel.com \
    --cc=vdavydov.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox