linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Glauber Costa <glommer@parallels.com>
To: linux-mm@kvack.org
Cc: cgroups@vger.kernel.org, Mel Gorman <mgorman@suse.de>,
	Tejun Heo <tj@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Michal Hocko <mhocko@suse.cz>,
	Johannes Weiner <hannes@cmpxchg.org>,
	kamezawa.hiroyu@jp.fujitsu.com, Christoph Lameter <cl@linux.com>,
	David Rientjes <rientjes@google.com>,
	Pekka Enberg <penberg@kernel.org>,
	devel@openvz.org, Glauber Costa <glommer@parallels.com>,
	Pekka Enberg <penberg@cs.helsinki.fi>,
	Suleiman Souhlal <suleiman@google.com>
Subject: [PATCH v4 10/19] memcg: skip memcg kmem allocations in specified code regions
Date: Fri, 12 Oct 2012 17:41:04 +0400	[thread overview]
Message-ID: <1350049273-17213-11-git-send-email-glommer@parallels.com> (raw)
In-Reply-To: <1350049273-17213-1-git-send-email-glommer@parallels.com>

This patch creates a mechanism that skip memcg allocations during
certain pieces of our core code. It basically works in the same way
as preempt_disable()/preempt_enable(): By marking a region under
which all allocations will be accounted to the root memcg.

We need this to prevent races in early cache creation, when we
allocate data using caches that are not necessarily created already.

[ v2: wrap the whole enqueue process, INIT_WORK can alloc memory ]

Signed-off-by: Glauber Costa <glommer@parallels.com>
CC: Christoph Lameter <cl@linux.com>
CC: Pekka Enberg <penberg@cs.helsinki.fi>
CC: Michal Hocko <mhocko@suse.cz>
CC: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
CC: Johannes Weiner <hannes@cmpxchg.org>
CC: Suleiman Souhlal <suleiman@google.com>
CC: Tejun Heo <tj@kernel.org>
---
 include/linux/sched.h |  1 +
 mm/memcontrol.c       | 63 +++++++++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 62 insertions(+), 2 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 0d907e1..9fad6c1 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1581,6 +1581,7 @@ struct task_struct {
 		unsigned long nr_pages;	/* uncharged usage */
 		unsigned long memsw_nr_pages; /* uncharged mem+swap usage */
 	} memcg_batch;
+	unsigned int memcg_kmem_skip_account;
 #endif
 #ifdef CONFIG_HAVE_HW_BREAKPOINT
 	atomic_t ptrace_bp_refcnt;
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 148baad..96916bd 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2949,6 +2949,41 @@ out:
 	kfree(s->memcg_params);
 }
 
+/*
+ * During the creation a new cache, we need to disable our accounting mechanism
+ * altogether. This is true even if we are not creating, but rather just
+ * enqueing new caches to be created.
+ *
+ * This is because that process will trigger allocations; some visible, like
+ * explicit kmallocs to auxiliary data structures, name strings and internal
+ * cache structures; some well concealed, like INIT_WORK() that can allocate
+ * objects during debug.
+ *
+ * If any allocation happens during memcg_kmem_get_cache, we will recurse back
+ * to it. This may not be a bounded recursion: since the first cache creation
+ * failed to complete (waiting on the allocation), we'll just try to create the
+ * cache again, failing at the same point.
+ *
+ * memcg_kmem_get_cache is prepared to abort after seeing a positive count of
+ * memcg_kmem_skip_account. So we enclose anything that might allocate memory
+ * inside the following two functions.
+ */
+static void memcg_stop_kmem_account(void)
+{
+	if (!current->mm)
+		return;
+
+	current->memcg_kmem_skip_account++;
+}
+
+static void memcg_resume_kmem_account(void)
+{
+	if (!current->mm)
+		return;
+
+	current->memcg_kmem_skip_account--;
+}
+
 static char *memcg_cache_name(struct mem_cgroup *memcg, struct kmem_cache *cachep)
 {
 	char *name;
@@ -3008,7 +3043,10 @@ static struct kmem_cache *memcg_create_kmem_cache(struct mem_cgroup *memcg,
 	if (new_cachep)
 		goto out;
 
+	/* Don't block progress to enqueue caches for internal infrastructure */
+	memcg_stop_kmem_account();
 	new_cachep = kmem_cache_dup(memcg, cachep);
+	memcg_resume_kmem_account();
 
 	if (new_cachep == NULL) {
 		new_cachep = cachep;
@@ -3046,8 +3084,8 @@ static void memcg_create_cache_work_func(struct work_struct *w)
  * Enqueue the creation of a per-memcg kmem_cache.
  * Called with rcu_read_lock.
  */
-static void memcg_create_cache_enqueue(struct mem_cgroup *memcg,
-				       struct kmem_cache *cachep)
+static void __memcg_create_cache_enqueue(struct mem_cgroup *memcg,
+					 struct kmem_cache *cachep)
 {
 	struct create_work *cw;
 
@@ -3066,6 +3104,24 @@ static void memcg_create_cache_enqueue(struct mem_cgroup *memcg,
 	schedule_work(&cw->work);
 }
 
+static void memcg_create_cache_enqueue(struct mem_cgroup *memcg,
+				       struct kmem_cache *cachep)
+{
+	/*
+	 * We need to stop accounting when we kmalloc, because if the
+	 * corresponding kmalloc cache is not yet created, the first allocation
+	 * in __memcg_create_cache_enqueue will recurse.
+	 *
+	 * However, it is better to enclose the whole function. Depending on
+	 * the debugging options enabled, INIT_WORK(), for instance, can
+	 * trigger an allocation. This too, will make us recurse. Because at
+	 * this point we can't allow ourselves back into memcg_kmem_get_cache,
+	 * the safest choice is to do it like this, wrapping the whole function.
+	 */
+	memcg_stop_kmem_account();
+	__memcg_create_cache_enqueue(memcg, cachep);
+	memcg_resume_kmem_account();
+}
 /*
  * Return the kmem_cache we're supposed to use for a slab allocation.
  * We try to use the current memcg's version of the cache.
@@ -3085,6 +3141,9 @@ struct kmem_cache *__memcg_kmem_get_cache(struct kmem_cache *cachep,
 	struct mem_cgroup *memcg;
 	int idx;
 
+	if (!current->mm || current->memcg_kmem_skip_account)
+		return cachep;
+
 	if (cachep->memcg_params && cachep->memcg_params->memcg)
 		return cachep;
 
-- 
1.7.11.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2012-10-12 13:42 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-12 13:40 [PATCH v4 00/19] slab accounting for memcg Glauber Costa
2012-10-12 13:40 ` [PATCH v4 01/19] slab: Ignore internal flags in cache creation Glauber Costa
2012-10-12 13:40 ` [PATCH v4 02/19] move slabinfo processing to slab_common.c Glauber Costa
2012-10-12 13:40 ` [PATCH v4 03/19] move print_slabinfo_header " Glauber Costa
2012-10-12 13:40 ` [PATCH v4 04/19] sl[au]b: process slabinfo_show in common code Glauber Costa
2012-10-12 13:40 ` [PATCH v4 05/19] slab: don't preemptively remove element from list in cache destroy Glauber Costa
2012-10-12 13:41 ` [PATCH v4 06/19] slab/slub: struct memcg_params Glauber Costa
2012-10-12 13:41 ` [PATCH v4 07/19] consider a memcg parameter in kmem_create_cache Glauber Costa
2012-10-12 13:41 ` [PATCH v4 08/19] Allocate memory for memcg caches whenever a new memcg appears Glauber Costa
2012-10-12 13:41 ` [PATCH v4 09/19] memcg: infrastructure to match an allocation to the right cache Glauber Costa
2012-10-12 13:41 ` Glauber Costa [this message]
2012-10-12 13:41 ` [PATCH v4 11/19] sl[au]b: always get the cache from its page in kfree Glauber Costa
2012-10-12 13:41 ` [PATCH v4 12/19] sl[au]b: Allocate objects from memcg cache Glauber Costa
2012-10-12 13:41 ` [PATCH v4 13/19] memcg: destroy memcg caches Glauber Costa
2012-10-12 13:41 ` [PATCH v4 14/19] memcg/sl[au]b Track all the memcg children of a kmem_cache Glauber Costa
2012-10-12 13:41 ` [PATCH v4 15/19] memcg/sl[au]b: shrink dead caches Glauber Costa
2012-10-12 13:41 ` [PATCH v4 16/19] Aggregate memcg cache values in slabinfo Glauber Costa
2012-10-12 13:41 ` [PATCH v4 17/19] slab: propagate tunables values Glauber Costa
2012-10-12 13:41 ` [PATCH v4 18/19] slub: slub-specific propagation changes Glauber Costa
2012-10-12 13:41 ` [PATCH v4 19/19] Add slab-specific documentation about the kmem controller Glauber Costa
2012-10-18 11:38 ` [PATCH v4 00/19] slab accounting for memcg Glauber Costa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1350049273-17213-11-git-send-email-glommer@parallels.com \
    --to=glommer@parallels.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=cl@linux.com \
    --cc=devel@openvz.org \
    --cc=hannes@cmpxchg.org \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.cz \
    --cc=penberg@cs.helsinki.fi \
    --cc=penberg@kernel.org \
    --cc=rientjes@google.com \
    --cc=suleiman@google.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox