linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Glauber Costa <glommer@parallels.com>
To: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	kamezawa.hiroyu@jp.fujitsu.com,
	Johannes Weiner <hannes@cmpxchg.org>, Tejun Heo <tj@kernel.org>,
	Michal Hocko <mhocko@suse.cz>, Christoph Lameter <cl@linux.com>,
	Pekka Enberg <penberg@kernel.org>,
	David Rientjes <rientjes@google.com>,
	Glauber Costa <glommer@parallels.com>,
	Pekka Enberg <penberg@cs.helsinki.fi>,
	Suleiman Souhlal <suleiman@google.com>
Subject: [PATCH v6 25/29] memcg/sl[au]b: shrink dead caches
Date: Thu,  1 Nov 2012 16:07:41 +0400	[thread overview]
Message-ID: <1351771665-11076-26-git-send-email-glommer@parallels.com> (raw)
In-Reply-To: <1351771665-11076-1-git-send-email-glommer@parallels.com>

This means that when we destroy a memcg cache that happened to be empty,
those caches may take a lot of time to go away: removing the memcg
reference won't destroy them - because there are pending references, and
the empty pages will stay there, until a shrinker is called upon for any
reason.

In this patch, we will call kmem_cache_shrink for all dead caches that
cannot be destroyed because of remaining pages. After shrinking, it is
possible that it could be freed. If this is not the case, we'll schedule
a lazy worker to keep trying.

[ v2: also call verify_dead for the slab ]
[ v3: use delayed_work to avoid calling verify_dead at every free]
[ v6: do not spawn worker if work is already pending ]

Signed-off-by: Glauber Costa <glommer@parallels.com>
CC: Christoph Lameter <cl@linux.com>
CC: Pekka Enberg <penberg@cs.helsinki.fi>
CC: Michal Hocko <mhocko@suse.cz>
CC: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
CC: Johannes Weiner <hannes@cmpxchg.org>
CC: Suleiman Souhlal <suleiman@google.com>
CC: Tejun Heo <tj@kernel.org>
---
 include/linux/slab.h |  2 +-
 mm/memcontrol.c      | 55 ++++++++++++++++++++++++++++++++++++++++++++++------
 2 files changed, 50 insertions(+), 7 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index ef2314e..0df42db 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -214,7 +214,7 @@ struct memcg_cache_params {
 			struct kmem_cache *root_cache;
 			bool dead;
 			atomic_t nr_pages;
-			struct work_struct destroy;
+			struct delayed_work destroy;
 		};
 	};
 };
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 31da8bc..6e2575a 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3048,12 +3048,35 @@ static void kmem_cache_destroy_work_func(struct work_struct *w)
 {
 	struct kmem_cache *cachep;
 	struct memcg_cache_params *p;
+	struct delayed_work *dw = to_delayed_work(w);
 
-	p = container_of(w, struct memcg_cache_params, destroy);
+	p = container_of(dw, struct memcg_cache_params, destroy);
 
 	cachep = memcg_params_to_cache(p);
 
-	if (!atomic_read(&cachep->memcg_params->nr_pages))
+	/*
+	 * If we get down to 0 after shrink, we could delete right away.
+	 * However, memcg_release_pages() already puts us back in the workqueue
+	 * in that case. If we proceed deleting, we'll get a dangling
+	 * reference, and removing the object from the workqueue in that case
+	 * is unnecessary complication. We are not a fast path.
+	 *
+	 * Note that this case is fundamentally different from racing with
+	 * shrink_slab(): if memcg_cgroup_destroy_cache() is called in
+	 * kmem_cache_shrink, not only we would be reinserting a dead cache
+	 * into the queue, but doing so from inside the worker racing to
+	 * destroy it.
+	 *
+	 * So if we aren't down to zero, we'll just schedule a worker and try
+	 * again
+	 */
+	if (atomic_read(&cachep->memcg_params->nr_pages) != 0) {
+		kmem_cache_shrink(cachep);
+		if (atomic_read(&cachep->memcg_params->nr_pages) == 0)
+			return;
+		/* Once per minute should be good enough. */
+		schedule_delayed_work(&cachep->memcg_params->destroy, 60 * HZ);
+	} else
 		kmem_cache_destroy(cachep);
 }
 
@@ -3063,10 +3086,30 @@ void mem_cgroup_destroy_cache(struct kmem_cache *cachep)
 		return;
 
 	/*
+	 * There are many ways in which we can get here.
+	 *
+	 * We can get to a memory-pressure situation while the delayed work is
+	 * still pending to run. The vmscan shrinkers can then release all
+	 * cache memory and get us to destruction. If this is the case, we'll
+	 * be executed twice, which is a bug (the second time will execute over
+	 * bogus data). In this case, cancelling the work should be fine.
+	 *
+	 * But we can also get here from the worker itself, if
+	 * kmem_cache_shrink is enough to shake all the remaining objects and
+	 * get the page count to 0. In this case, we'll deadlock if we try to
+	 * cancel the work (the worker runs with an internal lock held, which
+	 * is the same lock we would hold for cancel_delayed_work_sync().)
+	 *
+	 * Since we can't possibly know who got us here, just refrain from
+	 * running if there is already work pending
+	 */
+	if (delayed_work_pending(&cachep->memcg_params->destroy))
+		return;
+	/*
 	 * We have to defer the actual destroying to a workqueue, because
 	 * we might currently be in a context that cannot sleep.
 	 */
-	schedule_work(&cachep->memcg_params->destroy);
+	schedule_delayed_work(&cachep->memcg_params->destroy, 0);
 }
 
 static char *memcg_cache_name(struct mem_cgroup *memcg, struct kmem_cache *s)
@@ -3218,9 +3261,9 @@ static void mem_cgroup_destroy_all_caches(struct mem_cgroup *memcg)
 	list_for_each_entry(params, &memcg->memcg_slab_caches, list) {
 		cachep = memcg_params_to_cache(params);
 		cachep->memcg_params->dead = true;
-		INIT_WORK(&cachep->memcg_params->destroy,
-			  kmem_cache_destroy_work_func);
-		schedule_work(&cachep->memcg_params->destroy);
+		INIT_DELAYED_WORK(&cachep->memcg_params->destroy,
+				  kmem_cache_destroy_work_func);
+		schedule_delayed_work(&cachep->memcg_params->destroy, 0);
 	}
 	mutex_unlock(&memcg->slab_caches_mutex);
 }
-- 
1.7.11.7

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2012-11-01 12:09 UTC|newest]

Thread overview: 75+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-11-01 12:07 [PATCH v6 00/29] kmem controller for memcg Glauber Costa
2012-11-01 12:07 ` [PATCH v6 01/29] memcg: Make it possible to use the stock for more than one page Glauber Costa
2012-11-01 12:07 ` [PATCH v6 02/29] memcg: Reclaim when more than one page needed Glauber Costa
2012-11-01 12:07 ` [PATCH v6 03/29] memcg: change defines to an enum Glauber Costa
2012-11-01 12:07 ` [PATCH v6 04/29] kmem accounting basic infrastructure Glauber Costa
2012-11-01 12:07 ` [PATCH v6 05/29] Add a __GFP_KMEMCG flag Glauber Costa
2012-11-01 19:58   ` Christoph Lameter
2012-11-01 12:07 ` [PATCH v6 06/29] memcg: kmem controller infrastructure Glauber Costa
2012-11-01 20:03   ` Christoph Lameter
2012-11-01 12:07 ` [PATCH v6 07/29] mm: Allocate kernel pages to the right memcg Glauber Costa
2012-11-01 12:07 ` [PATCH v6 08/29] res_counter: return amount of charges after res_counter_uncharge Glauber Costa
2012-11-01 12:07 ` [PATCH v6 09/29] memcg: kmem accounting lifecycle management Glauber Costa
2012-11-01 12:07 ` [PATCH v6 10/29] memcg: use static branches when code not in use Glauber Costa
2012-11-01 12:07 ` [PATCH v6 11/29] memcg: allow a memcg with kmem charges to be destructed Glauber Costa
2012-11-02  0:05   ` Andrew Morton
2012-11-02  7:50     ` Glauber Costa
2012-11-06 10:54     ` Michal Hocko
2012-11-01 12:07 ` [PATCH v6 12/29] execute the whole memcg freeing in free_worker Glauber Costa
2012-11-01 12:07 ` [PATCH v6 13/29] protect architectures where THREAD_SIZE >= PAGE_SIZE against fork bombs Glauber Costa
2012-11-01 12:07 ` [PATCH v6 14/29] Add documentation about the kmem controller Glauber Costa
2012-11-01 12:07 ` [PATCH v6 15/29] slab/slub: struct memcg_params Glauber Costa
2012-11-01 12:07 ` [PATCH v6 16/29] slab: annotate on-slab caches nodelist locks Glauber Costa
2012-11-01 12:07 ` [PATCH v6 17/29] consider a memcg parameter in kmem_create_cache Glauber Costa
2012-11-01 12:07 ` [PATCH v6 18/29] Allocate memory for memcg caches whenever a new memcg appears Glauber Costa
2012-11-06  0:23   ` Andrew Morton
2012-11-07  7:05     ` Glauber Costa
2012-11-07  7:10       ` Andrew Morton
2012-11-01 12:07 ` [PATCH v6 19/29] memcg: infrastructure to match an allocation to the right cache Glauber Costa
2012-11-06  0:28   ` Andrew Morton
2012-11-06  8:03     ` Michal Hocko
2012-11-08 11:05       ` Michal Hocko
2012-11-08 14:33         ` Michal Hocko
2012-11-07  7:04     ` Glauber Costa
2012-11-07  7:13       ` Andrew Morton
2012-11-01 12:07 ` [PATCH v6 20/29] memcg: skip memcg kmem allocations in specified code regions Glauber Costa
2012-11-06  0:33   ` Andrew Morton
2012-11-01 12:07 ` [PATCH v6 21/29] sl[au]b: always get the cache from its page in kmem_cache_free Glauber Costa
2012-11-01 12:07 ` [PATCH v6 22/29] sl[au]b: Allocate objects from memcg cache Glauber Costa
2012-11-01 12:07 ` [PATCH v6 23/29] memcg: destroy memcg caches Glauber Costa
2012-11-02  0:05   ` Andrew Morton
2012-11-02  7:46     ` Glauber Costa
2012-11-02 20:19       ` Michal Hocko
2012-11-06  0:40   ` Andrew Morton
2012-11-01 12:07 ` [PATCH v6 24/29] memcg/sl[au]b Track all the memcg children of a kmem_cache Glauber Costa
2012-11-01 12:07 ` Glauber Costa [this message]
2012-11-06  0:48   ` [PATCH v6 25/29] memcg/sl[au]b: shrink dead caches Andrew Morton
2012-11-07  7:13     ` Glauber Costa
2012-11-07  7:16       ` Andrew Morton
2012-11-07  9:22         ` Glauber Costa
2012-11-07 22:46           ` Andrew Morton
2012-11-08  7:13             ` Glauber Costa
2012-11-08 17:15             ` Christoph Lameter
2012-11-08 19:21               ` Andrew Morton
2012-11-08 22:31                 ` Glauber Costa
2012-11-08 22:40                   ` Andrew Morton
2012-11-09 20:06                     ` Christoph Lameter
2012-11-09 20:04                 ` Christoph Lameter
2012-11-01 12:07 ` [PATCH v6 26/29] Aggregate memcg cache values in slabinfo Glauber Costa
2012-11-06  0:57   ` Andrew Morton
2012-11-01 12:07 ` [PATCH v6 27/29] slab: propagate tunables values Glauber Costa
2012-11-01 12:07 ` [PATCH v6 28/29] slub: slub-specific propagation changes Glauber Costa
2012-11-06 19:25   ` Andrew Morton
2012-11-07 15:53   ` Sasha Levin
2012-11-08  6:51     ` Glauber Costa
2012-11-09  3:37       ` Sasha Levin
2012-11-14 12:06         ` Glauber Costa
2012-11-01 12:07 ` [PATCH v6 29/29] Add slab-specific documentation about the kmem controller Glauber Costa
2012-11-02  0:04 ` [PATCH v6 00/29] kmem controller for memcg Andrew Morton
2012-11-02  7:41   ` Glauber Costa
2012-11-02 19:25     ` JoonSoo Kim
2012-11-02 23:06       ` Tejun Heo
2012-11-05  8:14         ` Glauber Costa
2012-11-05  8:18       ` Glauber Costa
2012-11-03  3:36     ` Greg Thelen
2012-11-02  8:30   ` Pekka Enberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1351771665-11076-26-git-send-email-glommer@parallels.com \
    --to=glommer@parallels.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=hannes@cmpxchg.org \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.cz \
    --cc=penberg@cs.helsinki.fi \
    --cc=penberg@kernel.org \
    --cc=rientjes@google.com \
    --cc=suleiman@google.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox