From: Glauber Costa <glommer@parallels.com>
To: cgroups@vger.kernel.org
Cc: linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
linux-kernel@vger.kernel.org,
Frederic Weisbecker <fweisbec@gmail.com>,
David Rientjes <rientjes@google.com>,
Pekka Enberg <penberg@kernel.org>, Michal Hocko <mhocko@suse.cz>,
Johannes Weiner <hannes@cmpxchg.org>,
Christoph Lameter <cl@linux.com>,
devel@openvz.org, kamezawa.hiroyu@jp.fujitsu.com,
Tejun Heo <tj@kernel.org>, Glauber Costa <glommer@parallels.com>,
Pekka Enberg <penberg@cs.helsinki.fi>,
Suleiman Souhlal <suleiman@google.com>
Subject: [PATCH 09/11] memcg: propagate kmem limiting information to children
Date: Mon, 25 Jun 2012 18:15:26 +0400 [thread overview]
Message-ID: <1340633728-12785-10-git-send-email-glommer@parallels.com> (raw)
In-Reply-To: <1340633728-12785-1-git-send-email-glommer@parallels.com>
The current memcg slab cache management fails to present satisfatory hierarchical
behavior in the following scenario:
-> /cgroups/memory/A/B/C
* kmem limit set at A
* A and B empty taskwise
* bash in C does find /
Because kmem_accounted is a boolean that was not set for C, no accounting
would be done. This is, however, not what we expect.
The basic idea, is that when a cgroup is limited, we walk the tree
upwards (something Kame and I already thought about doing for other purposes),
and make sure that we store the information about the parent being limited in
kmem_accounted (that is turned into a bitmap: two booleans would not be space
efficient). The code for that is taken from sched/core.c. My reasons for not
putting it into a common place is to dodge the type issues that would arise
from a common implementation between memcg and the scheduler - but I think
that it should ultimately happen, so if you want me to do it now, let me
know.
We do the reverse operation when a formerly limited cgroup becomes unlimited.
Signed-off-by: Glauber Costa <glommer@parallels.com>
CC: Christoph Lameter <cl@linux.com>
CC: Pekka Enberg <penberg@cs.helsinki.fi>
CC: Michal Hocko <mhocko@suse.cz>
CC: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
CC: Johannes Weiner <hannes@cmpxchg.org>
CC: Suleiman Souhlal <suleiman@google.com>
---
mm/memcontrol.c | 86 +++++++++++++++++++++++++++++++++++++++++++++----------
1 file changed, 71 insertions(+), 15 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index fe5388e..a6a440b 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -287,7 +287,11 @@ struct mem_cgroup {
* Should the accounting and control be hierarchical, per subtree?
*/
bool use_hierarchy;
- bool kmem_accounted;
+ /*
+ * bit0: accounted by this cgroup
+ * bit1: accounted by a parent.
+ */
+ volatile unsigned long kmem_accounted;
bool oom_lock;
atomic_t under_oom;
@@ -340,6 +344,9 @@ struct mem_cgroup {
#endif
};
+#define KMEM_ACCOUNTED_THIS 0
+#define KMEM_ACCOUNTED_PARENT 1
+
/* Stuffs for move charges at task migration. */
/*
* Types of charges to be moved. "move_charge_at_immitgrate" is treated as a
@@ -589,7 +596,7 @@ EXPORT_SYMBOL(__mem_cgroup_free_kmem_page);
static void disarm_kmem_keys(struct mem_cgroup *memcg)
{
- if (memcg->kmem_accounted)
+ if (test_bit(KMEM_ACCOUNTED_THIS, &memcg->kmem_accounted))
static_key_slow_dec(&mem_cgroup_kmem_enabled_key);
}
#else
@@ -4027,6 +4034,66 @@ static ssize_t mem_cgroup_read(struct cgroup *cont, struct cftype *cft,
len = scnprintf(str, sizeof(str), "%llu\n", (unsigned long long)val);
return simple_read_from_buffer(buf, nbytes, ppos, str, len);
}
+
+#ifdef CONFIG_CGROUP_MEM_RES_CTLR_KMEM
+static void mem_cgroup_update_kmem_limit(struct mem_cgroup *memcg, u64 val)
+{
+ struct mem_cgroup *iter;
+
+ mutex_lock(&set_limit_mutex);
+ if (!test_and_set_bit(KMEM_ACCOUNTED_THIS, &memcg->kmem_accounted) &&
+ val != RESOURCE_MAX) {
+
+ /*
+ * Once enabled, can't be disabled. We could in theory
+ * disable it if we haven't yet created any caches, or
+ * if we can shrink them all to death.
+ *
+ * But it is not worth the trouble
+ */
+ static_key_slow_inc(&mem_cgroup_kmem_enabled_key);
+
+ if (!memcg->use_hierarchy)
+ goto out;
+
+ for_each_mem_cgroup_tree(iter, memcg) {
+ if (iter == memcg)
+ continue;
+ set_bit(KMEM_ACCOUNTED_PARENT, &iter->kmem_accounted);
+ }
+
+ } else if (test_and_clear_bit(KMEM_ACCOUNTED_THIS, &memcg->kmem_accounted)
+ && val == RESOURCE_MAX) {
+
+ if (!memcg->use_hierarchy)
+ goto out;
+
+ for_each_mem_cgroup_tree(iter, memcg) {
+ struct mem_cgroup *parent;
+ if (iter == memcg)
+ continue;
+ /*
+ * We should only have our parent bit cleared if none of
+ * ouri parents are accounted. The transversal order of
+ * our iter function forces us to always look at the
+ * parents.
+ */
+ parent = parent_mem_cgroup(iter);
+ while (parent && (parent != memcg)) {
+ if (test_bit(KMEM_ACCOUNTED_THIS, &parent->kmem_accounted))
+ goto noclear;
+
+ parent = parent_mem_cgroup(parent);
+ }
+ clear_bit(KMEM_ACCOUNTED_PARENT, &iter->kmem_accounted);
+noclear:
+ continue;
+ }
+ }
+out:
+ mutex_unlock(&set_limit_mutex);
+}
+#endif
/*
* The user of this function is...
* RES_LIMIT.
@@ -4064,19 +4131,8 @@ static int mem_cgroup_write(struct cgroup *cont, struct cftype *cft,
ret = res_counter_set_limit(&memcg->kmem, val);
if (ret)
break;
- /*
- * Once enabled, can't be disabled. We could in theory
- * disable it if we haven't yet created any caches, or
- * if we can shrink them all to death.
- *
- * But it is not worth the trouble
- */
- mutex_lock(&set_limit_mutex);
- if (!memcg->kmem_accounted && val != RESOURCE_MAX) {
- static_key_slow_inc(&mem_cgroup_kmem_enabled_key);
- memcg->kmem_accounted = true;
- }
- mutex_unlock(&set_limit_mutex);
+ mem_cgroup_update_kmem_limit(memcg, val);
+ break;
}
#endif
else
--
1.7.10.2
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2012-06-25 14:19 UTC|newest]
Thread overview: 96+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-06-25 14:15 [PATCH 00/11] kmem controller for memcg: stripped down version Glauber Costa
2012-06-25 14:15 ` [PATCH 01/11] memcg: Make it possible to use the stock for more than one page Glauber Costa
2012-06-25 17:44 ` Tejun Heo
2012-06-25 22:29 ` Glauber Costa
2012-06-25 22:33 ` Tejun Heo
2012-06-26 4:01 ` David Rientjes
2012-06-25 14:15 ` [PATCH 02/11] memcg: Reclaim when more than one page needed Glauber Costa
2012-06-25 23:33 ` Suleiman Souhlal
2012-06-26 8:39 ` Glauber Costa
2012-06-27 16:16 ` Suleiman Souhlal
2012-06-26 4:09 ` David Rientjes
2012-06-26 7:12 ` Glauber Costa
2012-06-26 8:54 ` David Rientjes
2012-06-26 9:08 ` Glauber Costa
2012-06-26 9:17 ` David Rientjes
2012-06-26 9:23 ` Glauber Costa
2012-06-27 10:03 ` Glauber Costa
2012-06-27 19:48 ` David Rientjes
2012-06-27 20:47 ` Glauber Costa
2012-06-25 14:15 ` [PATCH 03/11] memcg: change defines to an enum Glauber Costa
2012-06-26 4:11 ` David Rientjes
2012-06-26 8:28 ` Glauber Costa
2012-06-26 9:01 ` David Rientjes
2012-06-25 14:15 ` [PATCH 04/11] kmem slab accounting basic infrastructure Glauber Costa
2012-06-26 4:22 ` David Rientjes
2012-06-26 7:09 ` Glauber Costa
2012-06-25 14:15 ` [PATCH 05/11] Add a __GFP_KMEMCG flag Glauber Costa
2012-06-26 4:25 ` David Rientjes
2012-06-26 7:08 ` Glauber Costa
2012-06-26 9:03 ` David Rientjes
2012-06-25 14:15 ` [PATCH 06/11] memcg: kmem controller infrastructure Glauber Costa
2012-06-25 18:06 ` Tejun Heo
2012-06-25 22:28 ` Glauber Costa
2012-06-25 23:17 ` Andrew Morton
2012-06-26 14:40 ` Glauber Costa
2012-06-26 15:01 ` Glauber Costa
2012-06-26 18:01 ` Andrew Morton
2012-06-26 18:08 ` Tejun Heo
2012-06-26 18:14 ` Glauber Costa
2012-06-26 19:20 ` Andrew Morton
2012-06-26 15:29 ` Glauber Costa
2012-06-26 9:12 ` David Rientjes
2012-06-26 9:17 ` Glauber Costa
2012-06-27 4:01 ` David Rientjes
2012-06-27 9:33 ` Glauber Costa
2012-06-27 19:46 ` David Rientjes
2012-06-25 14:15 ` [PATCH 07/11] mm: Allocate kernel pages to the right memcg Glauber Costa
2012-06-25 18:07 ` Tejun Heo
2012-06-25 22:27 ` Glauber Costa
2012-06-25 14:15 ` [PATCH 08/11] memcg: disable kmem code when not in use Glauber Costa
2012-06-26 5:51 ` Kamezawa Hiroyuki
2012-06-25 14:15 ` Glauber Costa [this message]
2012-06-25 18:29 ` [PATCH 09/11] memcg: propagate kmem limiting information to children Tejun Heo
2012-06-25 22:36 ` Glauber Costa
2012-06-25 22:49 ` Tejun Heo
2012-06-25 23:21 ` Andrew Morton
2012-06-26 5:23 ` David Rientjes
2012-06-25 23:23 ` Andrew Morton
2012-06-26 5:24 ` David Rientjes
2012-06-26 5:31 ` Andrew Morton
2012-06-26 7:23 ` Glauber Costa
2012-06-25 14:15 ` [PATCH 10/11] memcg: allow a memcg with kmem charges to be destructed Glauber Costa
2012-06-25 18:34 ` Tejun Heo
2012-06-25 22:25 ` Glauber Costa
2012-06-26 5:59 ` Kamezawa Hiroyuki
2012-06-26 7:21 ` Glauber Costa
2012-06-25 14:15 ` [PATCH 11/11] protect architectures where THREAD_SIZE >= PAGE_SIZE against fork bombs Glauber Costa
2012-06-25 16:55 ` Frederic Weisbecker
2012-06-25 18:38 ` Tejun Heo
2012-06-25 20:57 ` Frederic Weisbecker
2012-06-26 12:48 ` Glauber Costa
2012-06-26 13:38 ` Frederic Weisbecker
2012-06-26 13:37 ` Glauber Costa
2012-06-26 13:44 ` Frederic Weisbecker
2012-06-26 4:57 ` David Rientjes
2012-06-26 5:35 ` Kamezawa Hiroyuki
2012-06-26 7:23 ` Glauber Costa
2012-06-26 8:45 ` David Rientjes
2012-06-26 8:44 ` Glauber Costa
2012-06-26 9:05 ` David Rientjes
2012-06-25 23:27 ` [PATCH 00/11] kmem controller for memcg: stripped down version Andrew Morton
2012-06-26 7:17 ` Glauber Costa
2012-06-26 21:55 ` Andrew Morton
2012-06-27 1:08 ` David Rientjes
2012-06-27 8:39 ` Glauber Costa
2012-06-27 9:29 ` Fork bomb limitation in memcg WAS: " Glauber Costa
2012-06-27 12:29 ` Frederic Weisbecker
2012-06-27 12:28 ` Glauber Costa
2012-06-27 12:35 ` Frederic Weisbecker
2012-06-27 19:38 ` David Rientjes
2012-06-28 9:01 ` Glauber Costa
2012-06-28 22:25 ` Andrew Morton
2012-07-03 11:38 ` Glauber Costa
2012-07-12 15:40 ` Frederic Weisbecker
2012-08-07 13:59 ` Glauber Costa
2012-08-08 14:15 ` Glauber Costa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1340633728-12785-10-git-send-email-glommer@parallels.com \
--to=glommer@parallels.com \
--cc=akpm@linux-foundation.org \
--cc=cgroups@vger.kernel.org \
--cc=cl@linux.com \
--cc=devel@openvz.org \
--cc=fweisbec@gmail.com \
--cc=hannes@cmpxchg.org \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.cz \
--cc=penberg@cs.helsinki.fi \
--cc=penberg@kernel.org \
--cc=rientjes@google.com \
--cc=suleiman@google.com \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox