Re: [PATCH v3 06/16] memcg: infrastructure to match an allocation to the right cache

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Glauber Costa <glommer@parallels.com>
To: Tejun Heo <tj@kernel.org>
Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org,
	kamezawa.hiroyu@jp.fujitsu.com, devel@openvz.org,
	linux-mm@kvack.org, Suleiman Souhlal <suleiman@google.com>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	Mel Gorman <mgorman@suse.de>,
	David Rientjes <rientjes@google.com>,
	Christoph Lameter <cl@linux.com>,
	Pekka Enberg <penberg@cs.helsinki.fi>,
	Michal Hocko <mhocko@suse.cz>,
	Johannes Weiner <hannes@cmpxchg.org>
Subject: Re: [PATCH v3 06/16] memcg: infrastructure to match an allocation to the right cache
Date: Mon, 24 Sep 2012 12:46:35 +0400	[thread overview]
Message-ID: <50601DEB.10705@parallels.com> (raw)
In-Reply-To: <20120921183217.GH7264@google.com>

On 09/21/2012 10:32 PM, Tejun Heo wrote:
> On Tue, Sep 18, 2012 at 06:12:00PM +0400, Glauber Costa wrote:
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index 04851bb..1cce5c3 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -339,6 +339,11 @@ struct mem_cgroup {
>>  #ifdef CONFIG_INET
>>  	struct tcp_memcontrol tcp_mem;
>>  #endif
>> +
>> +#ifdef CONFIG_MEMCG_KMEM
>> +	/* Slab accounting */
>> +	struct kmem_cache *slabs[MAX_KMEM_CACHE_TYPES];
>> +#endif
> 
> Bah, 400 entry array in struct mem_cgroup.  Can't we do something a
> bit more flexible?
> 

I guess. I still would like it to be an array, so we can easily access
its fields. There are two ways around this:

1) Do like the events mechanism and allocate this in a separate
structure. Add a pointer chase in the access, and I don't think it helps
much because it gets allocated anyway. But we could at least
defer it to the time when we limit the cache.

2) The indexes are only assigned after the slab is FULL. At that time, a
lot of the caches are already initialized. We can, for instance, allow
for "twice the number we have in the system", which already provides
room for a couple of more appearing. Combining with the 1st approach, we
can defer it to limit-time, and then allow for, say, 50 % more caches
than we already have. The pointer chasing may very well be worth it...

>> +static char *memcg_cache_name(struct mem_cgroup *memcg, struct kmem_cache *cachep)
>> +{
>> +	char *name;
>> +	struct dentry *dentry;
>> +
>> +	rcu_read_lock();
>> +	dentry = rcu_dereference(memcg->css.cgroup->dentry);
>> +	rcu_read_unlock();
>> +
>> +	BUG_ON(dentry == NULL);
>> +
>> +	name = kasprintf(GFP_KERNEL, "%s(%d:%s)",
>> +	    cachep->name, css_id(&memcg->css), dentry->d_name.name);
> 
> Maybe including full path is better, I don't know.

It can get way too big.

>> +static void memcg_create_cache_enqueue(struct mem_cgroup *memcg,
>> +				       struct kmem_cache *cachep)
>> +{
>> +	struct create_work *cw;
>> +	unsigned long flags;
>> +
>> +	spin_lock_irqsave(&cache_queue_lock, flags);
>> +	list_for_each_entry(cw, &create_queue, list) {
>> +		if (cw->memcg == memcg && cw->cachep == cachep) {
>> +			spin_unlock_irqrestore(&cache_queue_lock, flags);
>> +			return;
>> +		}
>> +	}
>> +	spin_unlock_irqrestore(&cache_queue_lock, flags);
>> +
>> +	/* The corresponding put will be done in the workqueue. */
>> +	if (!css_tryget(&memcg->css))
>> +		return;
>> +
>> +	cw = kmalloc(sizeof(struct create_work), GFP_NOWAIT);
>> +	if (cw == NULL) {
>> +		css_put(&memcg->css);
>> +		return;
>> +	}
>> +
>> +	cw->memcg = memcg;
>> +	cw->cachep = cachep;
>> +	spin_lock_irqsave(&cache_queue_lock, flags);
>> +	list_add_tail(&cw->list, &create_queue);
>> +	spin_unlock_irqrestore(&cache_queue_lock, flags);
>> +
>> +	schedule_work(&memcg_create_cache_work);
>> +}
> 
> Why create your own worklist and flush mechanism?  Just embed a work
> item in create_work and use a dedicated workqueue for flushing.

I'll take a look at this.

> 
>> +/*
>> + * Return the kmem_cache we're supposed to use for a slab allocation.
>> + * We try to use the current memcg's version of the cache.
>> + *
>> + * If the cache does not exist yet, if we are the first user of it,
>> + * we either create it immediately, if possible, or create it asynchronously
>> + * in a workqueue.
>> + * In the latter case, we will let the current allocation go through with
>> + * the original cache.
>> + *
>> + * Can't be called in interrupt context or from kernel threads.
>> + * This function needs to be called with rcu_read_lock() held.
>> + */
>> +struct kmem_cache *__memcg_kmem_get_cache(struct kmem_cache *cachep,
>> +					  gfp_t gfp)
>> +{
>> +	struct mem_cgroup *memcg;
>> +	int idx;
>> +	struct task_struct *p;
>> +
>> +	if (cachep->memcg_params.memcg)
>> +		return cachep;
>> +
>> +	idx = cachep->memcg_params.id;
>> +	VM_BUG_ON(idx == -1);
>> +
>> +	rcu_read_lock();
>> +	p = rcu_dereference(current->mm->owner);
>> +	memcg = mem_cgroup_from_task(p);
>> +	rcu_read_unlock();
>> +
>> +	if (!memcg_can_account_kmem(memcg))
>> +		return cachep;
>> +
>> +	if (memcg->slabs[idx] == NULL) {
>> +		memcg_create_cache_enqueue(memcg, cachep);
> 
> Do we want to wait for the work item if @gfp allows?
> 

I tried this once, and it got complicated enough that I deemed as "not
worth it". I honestly don't remember much of the details now, it was one
of the first things I tried, and a bunch of time has passed. If you
think it is absolutely worth it, I can try it again. But at the very
best, I view this as an optimization.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2012-09-24  8:50 UTC|newest]

Thread overview: 74+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-09-18 14:11 [PATCH v3 00/16] slab accounting for memcg Glauber Costa
2012-09-18 14:11 ` [PATCH v3 01/16] slab/slub: struct memcg_params Glauber Costa
2012-09-18 14:11 ` [PATCH v3 02/16] slub: use free_page instead of put_page for freeing kmalloc allocation Glauber Costa
2012-09-18 14:11 ` [PATCH v3 03/16] slab: Ignore the cflgs bit in cache creation Glauber Costa
2012-09-18 15:20   ` Christoph Lameter
2012-09-21 17:33   ` Tejun Heo
2012-09-18 14:11 ` [PATCH v3 04/16] provide a common place for initcall processing in kmem_cache Glauber Costa
2012-09-18 15:22   ` Christoph Lameter
2012-09-18 14:11 ` [PATCH v3 05/16] consider a memcg parameter in kmem_create_cache Glauber Costa
2012-09-21 18:14   ` Tejun Heo
2012-09-24  8:12     ` Glauber Costa
2012-09-24 12:41       ` Christoph
2012-09-24 12:41         ` Glauber Costa
2012-09-24 13:42           ` Christoph Lameter
2012-09-24 13:44             ` Glauber Costa
2012-09-24 13:56               ` Christoph Lameter
2012-09-24 13:57                 ` Glauber Costa
2012-09-24 15:38                   ` Pekka Enberg
2012-09-24 15:36                     ` Glauber Costa
2012-09-24 17:38                       ` Christoph Lameter
2012-09-24 17:39                     ` Christoph Lameter
2012-10-02 14:46   ` Michal Hocko
2012-09-18 14:12 ` [PATCH v3 06/16] memcg: infrastructure to match an allocation to the right cache Glauber Costa
2012-09-21 18:32   ` Tejun Heo
2012-09-24  8:46     ` Glauber Costa [this message]
2012-09-24 17:56       ` Tejun Heo
2012-09-25 13:57         ` Glauber Costa
2012-09-25 16:28           ` Christoph Lameter
2012-09-21 20:52   ` Tejun Heo
2012-09-24  8:17     ` Glauber Costa
2012-09-24 17:58       ` Tejun Heo
2012-09-18 14:12 ` [PATCH v3 07/16] memcg: skip memcg kmem allocations in specified code regions Glauber Costa
2012-09-21 19:59   ` Tejun Heo
2012-09-24  9:09     ` Glauber Costa
2012-09-24 17:47       ` Tejun Heo
2012-09-18 14:12 ` [PATCH v3 08/16] slab: allow enable_cpu_cache to use preset values for its tunables Glauber Costa
2012-09-18 15:25   ` Christoph Lameter
2012-09-19  7:44     ` Glauber Costa
2012-09-21  9:29   ` Pekka Enberg
2012-09-18 14:12 ` [PATCH v3 09/16] sl[au]b: always get the cache from its page in kfree Glauber Costa
2012-09-18 15:28   ` Christoph Lameter
2012-09-19  7:42     ` Glauber Costa
2012-09-19 14:14       ` Christoph Lameter
2012-09-21  9:33       ` Pekka Enberg
2012-09-21  9:30         ` Glauber Costa
2012-09-21  9:41           ` Pekka Enberg
2012-09-21 20:07             ` Tejun Heo
2012-09-21 20:14               ` Pekka Enberg
2012-09-21 20:16                 ` Tejun Heo
2012-09-18 14:12 ` [PATCH v3 10/16] sl[au]b: Allocate objects from memcg cache Glauber Costa
2012-09-18 14:12 ` [PATCH v3 11/16] memcg: destroy memcg caches Glauber Costa
2012-09-21 20:22   ` Tejun Heo
2012-09-18 14:12 ` [PATCH v3 12/16] memcg/sl[au]b Track all the memcg children of a kmem_cache Glauber Costa
2012-09-21 20:31   ` Tejun Heo
2012-09-18 14:12 ` [PATCH v3 13/16] slab: slab-specific propagation changes Glauber Costa
2012-09-18 17:00   ` Christoph Lameter
2012-09-19  7:41     ` Glauber Costa
2012-09-18 14:12 ` [PATCH v3 14/16] slub: slub-specific " Glauber Costa
2012-09-18 14:12 ` [PATCH v3 15/16] memcg/sl[au]b: shrink dead caches Glauber Costa
2012-09-18 17:02   ` Christoph Lameter
2012-09-19  7:40     ` Glauber Costa
2012-09-21  4:48   ` JoonSoo Kim
2012-09-21  8:40     ` Glauber Costa
2012-09-21  9:28       ` JoonSoo Kim
2012-09-21  9:31         ` Glauber Costa
2012-09-21 20:40   ` Tejun Heo
2012-09-24  8:25     ` Glauber Costa
2012-09-24 17:43       ` Tejun Heo
2012-09-18 14:12 ` [PATCH v3 16/16] Add documentation about the kmem controller Glauber Costa
2012-09-21  9:40 ` [PATCH v3 00/16] slab accounting for memcg Pekka Enberg
2012-09-21  9:43   ` Glauber Costa
2012-09-21 20:46 ` Tejun Heo
2012-09-21 20:47   ` Tejun Heo
2012-09-24  8:15   ` Glauber Costa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50601DEB.10705@parallels.com \
    --to=glommer@parallels.com \
    --cc=cgroups@vger.kernel.org \
    --cc=cl@linux.com \
    --cc=devel@openvz.org \
    --cc=fweisbec@gmail.com \
    --cc=hannes@cmpxchg.org \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.cz \
    --cc=penberg@cs.helsinki.fi \
    --cc=rientjes@google.com \
    --cc=suleiman@google.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox