Re: [PATCH 17/23] kmem controller charge/uncharge infrastructure

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Glauber Costa <glommer@parallels.com>
To: Suleiman Souhlal <suleiman@google.com>
Cc: cgroups@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, devel@openvz.org,
	kamezawa.hiroyu@jp.fujitsu.com, Michal Hocko <mhocko@suse.cz>,
	Johannes Weiner <hannes@cmpxchg.org>,
	fweisbec@gmail.com, Greg Thelen <gthelen@google.com>,
	Christoph Lameter <cl@linux.com>,
	Pekka Enberg <penberg@cs.helsinki.fi>
Subject: Re: [PATCH 17/23] kmem controller charge/uncharge infrastructure
Date: Wed, 2 May 2012 12:34:44 -0300	[thread overview]
Message-ID: <4FA15414.3050901@parallels.com> (raw)
In-Reply-To: <CABCjUKBk=RTCoH34XOQHxRsSp0G5iwtgeBKtdNEUeTE5kx07Vg@mail.gmail.com>

On 04/30/2012 05:56 PM, Suleiman Souhlal wrote:
>> +
>> +static void kmem_cache_destroy_work_func(struct work_struct *w)
>> +{
>> +       struct kmem_cache *cachep;
>> +       char *name;
>> +
>> +       spin_lock_irq(&cache_queue_lock);
>> +       while (!list_empty(&destroyed_caches)) {
>> +               cachep = container_of(list_first_entry(&destroyed_caches,
>> +                   struct mem_cgroup_cache_params, destroyed_list), struct
>> +                   kmem_cache, memcg_params);
>> +               name = (char *)cachep->name;
>> +               list_del(&cachep->memcg_params.destroyed_list);
>> +               spin_unlock_irq(&cache_queue_lock);
>> +               synchronize_rcu();
>
> Is this synchronize_rcu() still needed, now that we don't use RCU to
> protect memcgs from disappearing during allocation anymore?
>
> Also, should we drop the memcg reference we got in
> memcg_create_kmem_cache() here?

I have a reworked code I like better for this part.

It reads as follows:

static void kmem_cache_destroy_work_func(struct work_struct *w)
{
         struct kmem_cache *cachep;
         const char *name;
         struct mem_cgroup_cache_params *params, *tmp;
         unsigned long flags;
         LIST_HEAD(delete_unlocked);

         synchronize_rcu();

         spin_lock_irqsave(&cache_queue_lock, flags);
         list_for_each_entry_safe(params, tmp, &destroyed_caches, 
destroyed_list) {
                 cachep = container_of(params, struct kmem_cache, 
memcg_params);
                 list_move(&cachep->memcg_params.destroyed_list, 
&delete_unlocked);
         }
         spin_unlock_irqrestore(&cache_queue_lock, flags);

         list_for_each_entry_safe(params, tmp, &delete_unlocked, 
destroyed_list) {
                 cachep = container_of(params, struct kmem_cache, 
memcg_params);
                 list_del(&cachep->memcg_params.destroyed_list);
                 name = cachep->name;
                 mem_cgroup_put(cachep->memcg_params.memcg);
                 kmem_cache_destroy(cachep);
                 kfree(name);
         }
}

I think having a list in stack is better because we don't need to hold & 
drop the spinlock, and can achieve more parallelism if multiple cpus are 
scheduling the destroy worker.

As you see, this version does a put() - so the answer to your question 
is yes.

synchronize_rcu() also gets a new meaning, in the sense that it only 
waits until everybody that is destroying a cache can have the chance to 
get their stuff into the list.

But to be honest, one of the things we need to do for the next version, 
is audit all the locking rules and write them down...

>> +static void memcg_create_cache_work_func(struct work_struct *w)
>> +{
>> +       struct kmem_cache *cachep;
>> +       struct create_work *cw;
>> +
>> +       spin_lock_irq(&cache_queue_lock);
>> +       while (!list_empty(&create_queue)) {
>> +               cw = list_first_entry(&create_queue, struct create_work, list);
>> +               list_del(&cw->list);
>> +               spin_unlock_irq(&cache_queue_lock);
>> +               cachep = memcg_create_kmem_cache(cw->memcg, cw->cachep);
>> +               if (cachep == NULL)
>> +                       printk(KERN_ALERT
>> +                       "%s: Couldn't create memcg-cache for %s memcg %s\n",
>> +                       __func__, cw->cachep->name,
>> +                       cw->memcg->css.cgroup->dentry->d_name.name);
>
> We might need rcu_dereference() here (and hold rcu_read_lock()).
> Or we could just remove this message.

Don't understand this "or". Again, cache creation can still fail. This 
is specially true in constrained memory situations.

>> +/*
>> + * Return the kmem_cache we're supposed to use for a slab allocation.
>> + * If we are in interrupt context or otherwise have an allocation that
>> + * can't fail, we return the original cache.
>> + * Otherwise, we will try to use the current memcg's version of the cache.
>> + *
>> + * If the cache does not exist yet, if we are the first user of it,
>> + * we either create it immediately, if possible, or create it asynchronously
>> + * in a workqueue.
>> + * In the latter case, we will let the current allocation go through with
>> + * the original cache.
>> + *
>> + * This function returns with rcu_read_lock() held.
>> + */
>> +struct kmem_cache *__mem_cgroup_get_kmem_cache(struct kmem_cache *cachep,
>> +                                            gfp_t gfp)
>> +{
>> +       struct mem_cgroup *memcg;
>> +       int idx;
>> +
>> +       gfp |=  cachep->allocflags;
>> +
>> +       if ((current->mm == NULL))
>> +               return cachep;
>> +
>> +       if (cachep->memcg_params.memcg)
>> +               return cachep;
>> +
>> +       idx = cachep->memcg_params.id;
>> +       VM_BUG_ON(idx == -1);
>> +
>> +       memcg = mem_cgroup_from_task(current);
>> +       if (!mem_cgroup_kmem_enabled(memcg))
>> +               return cachep;
>> +
>> +       if (rcu_access_pointer(memcg->slabs[idx]) == NULL) {
>> +               memcg_create_cache_enqueue(memcg, cachep);
>> +               return cachep;
>> +       }
>> +
>> +       return rcu_dereference(memcg->slabs[idx]);
>
> Is it ok to call rcu_access_pointer() and rcu_dereference() without
> holding rcu_read_lock()?

No, but mem_cgroup_from_task should be called with rcu_read_lock() held 
as well.

I forgot to change it in the comments, but this function should be 
called with the rcu_read_lock() held. I was careful enough to check the 
callers, and they do. But being only human, some of them might have 
escaped...

>> +
>> +bool __mem_cgroup_charge_kmem(gfp_t gfp, size_t size)
>> +{
>> +       struct mem_cgroup *memcg;
>> +       bool ret = true;
>> +
>> +       rcu_read_lock();
>> +       memcg = mem_cgroup_from_task(current);
>> +
>> +       if (!mem_cgroup_kmem_enabled(memcg))
>> +               goto out;
>> +
>> +       mem_cgroup_get(memcg);
>
> Why do we need to get a reference to the memcg for every charge?
> How will this work when deleting a memcg?

There are two charging functions here:

mem_cgroup_charge_slab() and mem_cgroup_charge_kmem() (the later had its 
name changed in my private branch, for the next submission)

The slub allocator will draw large kmalloc allocations directly from the 
page allocator, which is the case this function is designed to handle.

Since we have no cache to bill this against, we need to hold the 
reference here.

>> +       _memcg = memcg;
>> +       ret = __mem_cgroup_try_charge(NULL, gfp, delta / PAGE_SIZE,
>> +&_memcg, may_oom);
>> +       if (ret == -ENOMEM)
>> +               return ret;
>> +       else if ((ret == -EINTR) || (ret&&  (gfp&  __GFP_NOFAIL)))  {
>> +               nofail = true;
>> +               /*
>> +                * __mem_cgroup_try_charge() chose to bypass to root due
>> +                * to OOM kill or fatal signal.
>> +                * Since our only options are to either fail the
>> +                * allocation or charge it to this cgroup, force the
>> +                * change, going above the limit if needed.
>> +                */
>> +               res_counter_charge_nofail(&memcg->res, delta,&fail_res);
>
> We might need to charge memsw here too.

hummm, isn't there a more automated way to do that ?

I'll take a look.

> Might need to uncharge memsw.

Here too.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2012-05-02 15:38 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-04-20 21:57 [PATCH 00/23] slab+slub accounting for memcg Glauber Costa
2012-04-20 21:57 ` [PATCH 01/23] slub: don't create a copy of the name string in kmem_cache_create Glauber Costa
2012-04-20 21:57 ` [PATCH 02/23] slub: always get the cache from its page in kfree Glauber Costa
2012-04-20 21:57 ` [PATCH 03/23] slab: rename gfpflags to allocflags Glauber Costa
2012-04-20 21:57 ` [PATCH 04/23] memcg: Make it possible to use the stock for more than one page Glauber Costa
2012-04-25  0:59   ` KAMEZAWA Hiroyuki
2012-04-20 21:57 ` [PATCH 05/23] memcg: Reclaim when more than one page needed Glauber Costa
2012-04-25  1:16   ` KAMEZAWA Hiroyuki
2012-04-20 21:57 ` [PATCH 06/23] slab: use obj_size field of struct kmem_cache when not debugging Glauber Costa
2012-04-20 21:57 ` [PATCH 07/23] change defines to an enum Glauber Costa
2012-04-25  1:18   ` KAMEZAWA Hiroyuki
2012-04-20 21:57 ` [PATCH 08/23] don't force return value checking in res_counter_charge_nofail Glauber Costa
2012-04-25  1:28   ` KAMEZAWA Hiroyuki
2012-04-20 21:57 ` [PATCH 09/23] kmem slab accounting basic infrastructure Glauber Costa
2012-04-25  1:32   ` KAMEZAWA Hiroyuki
2012-04-25 14:38     ` Glauber Costa
2012-04-26  0:08       ` KAMEZAWA Hiroyuki
2012-04-30 19:33   ` Suleiman Souhlal
2012-05-02 15:15     ` Glauber Costa
2012-04-20 21:57 ` [PATCH 10/23] slab/slub: struct memcg_params Glauber Costa
2012-04-30 19:42   ` Suleiman Souhlal
2012-04-20 21:57 ` [PATCH 11/23] slub: consider a memcg parameter in kmem_create_cache Glauber Costa
2012-04-24 14:03   ` Frederic Weisbecker
2012-04-24 14:27     ` Glauber Costa
2012-04-25  1:38   ` KAMEZAWA Hiroyuki
2012-04-25 14:37     ` Glauber Costa
2012-04-30 19:51   ` Suleiman Souhlal
2012-05-02 15:18     ` Glauber Costa
2012-04-22 23:53 ` [PATCH 12/23] slab: pass memcg parameter to kmem_cache_create Glauber Costa
2012-04-30 19:54   ` Suleiman Souhlal
2012-04-22 23:53 ` [PATCH 13/23] slub: create duplicate cache Glauber Costa
2012-04-24 14:18   ` Frederic Weisbecker
2012-04-24 14:37     ` Glauber Costa
2012-04-26 13:10       ` Frederic Weisbecker
2012-04-30 20:15   ` Suleiman Souhlal
2012-04-22 23:53 ` [PATCH 14/23] slub: provide kmalloc_no_account Glauber Costa
2012-04-22 23:53 ` [PATCH 15/23] slab: create duplicate cache Glauber Costa
2012-04-22 23:53 ` [PATCH 16/23] slab: provide kmalloc_no_account Glauber Costa
2012-04-25  1:44   ` KAMEZAWA Hiroyuki
2012-04-25 14:29     ` Glauber Costa
2012-04-26  0:13       ` KAMEZAWA Hiroyuki
2012-04-22 23:53 ` [PATCH 17/23] kmem controller charge/uncharge infrastructure Glauber Costa
2012-04-23 22:25   ` David Rientjes
2012-04-24 14:22     ` Frederic Weisbecker
2012-04-24 14:40       ` Glauber Costa
2012-04-24 20:25         ` David Rientjes
2012-04-24 21:36           ` Glauber Costa
2012-04-24 22:54             ` David Rientjes
2012-04-25 14:43               ` Glauber Costa
2012-04-24 20:21       ` David Rientjes
2012-04-27 11:38         ` Frederic Weisbecker
2012-04-27 18:13           ` David Rientjes
2012-04-25  1:56       ` KAMEZAWA Hiroyuki
2012-04-25 14:44         ` Glauber Costa
2012-04-27 12:22         ` Frederic Weisbecker
2012-04-30 20:56   ` Suleiman Souhlal
2012-05-02 15:34     ` Glauber Costa [this message]
2012-04-22 23:53 ` [PATCH 18/23] slub: charge allocation to a memcg Glauber Costa
2012-04-22 23:53 ` [PATCH 19/23] slab: per-memcg accounting of slab caches Glauber Costa
2012-04-30 21:25   ` Suleiman Souhlal
2012-05-02 15:40     ` Glauber Costa
2012-04-22 23:53 ` [PATCH 20/23] memcg: disable kmem code when not in use Glauber Costa
2012-04-22 23:53 ` [PATCH 21/23] memcg: Track all the memcg children of a kmem_cache Glauber Costa
2012-04-22 23:53 ` [PATCH 22/23] memcg: Per-memcg memory.kmem.slabinfo file Glauber Costa
2012-04-22 23:53 ` [PATCH 23/23] slub: create slabinfo file for memcg Glauber Costa
2012-04-22 23:59 ` [PATCH 00/23] slab+slub accounting " Glauber Costa
2012-04-30  9:59 ` [PATCH 0/3] A few fixes for '[PATCH 00/23] slab+slub accounting for memcg' series Anton Vorontsov
2012-04-30 10:01   ` [PATCH 1/3] slab: Proper off-slabs handling when duplicating caches Anton Vorontsov
2012-04-30 10:01   ` [PATCH 2/3] slab: Fix imbalanced rcu locking Anton Vorontsov
2012-04-30 10:02   ` [PATCH 3/3] slab: Get rid of mem_cgroup_put_kmem_cache() Anton Vorontsov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FA15414.3050901@parallels.com \
    --to=glommer@parallels.com \
    --cc=cgroups@vger.kernel.org \
    --cc=cl@linux.com \
    --cc=devel@openvz.org \
    --cc=fweisbec@gmail.com \
    --cc=gthelen@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.cz \
    --cc=penberg@cs.helsinki.fi \
    --cc=suleiman@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox