Re: [RFC][PATCH 2/4] cgroup ID

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Paul Menage <menage@google.com>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
	"nishimura@mxp.nes.nec.co.jp" <nishimura@mxp.nes.nec.co.jp>,
	"balbir@linux.vnet.ibm.com" <balbir@linux.vnet.ibm.com>,
	"lizf@cn.fujitsu.com" <lizf@cn.fujitsu.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [RFC][PATCH 2/4] cgroup ID
Date: Fri, 5 Dec 2008 03:11:23 -0800	[thread overview]
Message-ID: <6599ad830812050311m3728ab69v465ed5d032792973@mail.gmail.com> (raw)
In-Reply-To: <20081205172959.8285271f.kamezawa.hiroyu@jp.fujitsu.com>

Hi Kamezawa,

I definitely agree with the idea of being able to traverse the cgroup
hierarchy without doing a cgroup_lock() and I've included some
comments below. But having said that, maybe there's a simpler
solution?

A while ago I posted some patches that added a per-hierarchy lock
which could be taken to prevent creation or destruction of cgroups in
a given hierarchy; it was lighter-weight than the full cgroup_lock().
Is that sufficient to avoid the deadlock that you mentioned in your
patch description?

The idea of having a short id for each cgroup to save space in the
swap cgroup sounds sensible - but I'm not sure that we need the RCU
support to make the id persist beyond the lifetime of the cgroup
itself.

On Fri, Dec 5, 2008 at 12:29 AM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
>
> +/*
> + * Cgroup ID for *internal* identification and lookup. For user-land,"path"
> + * of cgroup works well.
> + */

This comment seems misplaced and possibly unnecessary. Should it be
with the struct cgroup_id definition in cgroup.c?

>
> +/*
> + * For supporting cgroup lookup and hierarchy management.
> + */

A lot more commenting would be useful here.

> +/* An interface for usual lookup */
> +struct cgroup *cgroup_lookup(int id);
> +/* get next cgroup under tree (for scan) */
> +struct cgroup *
> +cgroup_get_next(int id, int rootid, int depth, int *foundid);
> +/* get id and depth of cgroup */
> +int cgroup_id(struct cgroup *cgroup);
> +int cgroup_depth(struct cgroup *cgroup);
> +/* For delayed freeing of IDs */
> +int cgroup_id_tryget(int id);
> +void cgroup_id_put(int id);
> +
>  #else /* !CONFIG_CGROUPS */
>
>  /*
> + * CGROUP ID
> + */

More comments needed about the exact semantics of these fields.

> +struct cgroup_id {
> +       struct cgroup *myself;

Can you call this cgroup for consistency with other struct cgroup pointers?

> +       unsigned int  id;
> +       unsigned int  depth;
> +       atomic_t      refcnt;
> +       struct rcu_head rcu_head;
> +       unsigned int  hierarchy_code[MAX_CGROUP_DEPTH];

How about "stack" for this array?

> +};
> +
> +void free_cgroupid_cb(struct rcu_head *head)
> +{
> +       struct cgroup_id *id;
> +
> +       id = container_of(head, struct cgroup_id, rcu_head);
> +       kfree(id);
> +}
> +
> +void free_cgroupid(struct cgroup_id *id)
> +{
> +       call_rcu(&id->rcu_head, free_cgroupid_cb);
> +}
> +

Rather than having a separate RCU callback for the cgroup_id
structure, how about marking it as "dead" when you unlink the cgroup
from the tree, and freeing it in the cgroup_diput() callback at the
same time the struct cgroup is freed? Or is the issue that you need
the id to persist longer than the cgroup itself, to prevent re-use?

> +static DEFINE_IDR(cgroup_idr);
> +DEFINE_SPINLOCK(cgroup_idr_lock);

Any reason to not have a separate idr and idr_lock per hierarchy?

> +
> +static int cgrouproot_setup_idr(struct cgroupfs_root *root)
> +{
> +       struct cgroup_id *newid;
> +       int err = -ENOMEM;
> +       int myid;
> +
> +       newid = kzalloc(sizeof(*newid), GFP_KERNEL);
> +       if (!newid)
> +               goto out;
> +       if (!idr_pre_get(&cgroup_idr, GFP_KERNEL))
> +               goto free_out;
> +
> +       spin_lock_irq(&cgroup_idr_lock);
> +       err = idr_get_new_above(&cgroup_idr, newid, 1, &myid);
> +       spin_unlock_irq(&cgroup_idr_lock);
> +
> +       /* This one is new idr....*/
> +       BUG_ON(err);

There's really no way this can fail?

> +/*
> + * should be called while "cgrp" is valid.
> + */

Can you be more specific here? Clearly calling a function with a
pointer to an object that might have been freed is a bad idea; if
that's all you mean then I don't think it needs to be called out in a
comment.

> +static int cgroup_prepare_id(struct cgroup *parent, struct cgroup_id **id)
> +{
> +       struct cgroup_id *newid;
> +       int myid, error;
> +
> +       /* check depth */
> +       if (parent->id->depth + 1 >= MAX_CGROUP_DEPTH)
> +               return -ENOSPC;
> +       newid = kzalloc(sizeof(*newid), GFP_KERNEL);
> +       if (!newid)
> +               return -ENOMEM;
> +       /* get id */
> +       if (unlikely(!idr_pre_get(&cgroup_idr, GFP_KERNEL))) {
> +               error = -ENOMEM;
> +               goto err_out;
> +       }
> +       spin_lock_irq(&cgroup_idr_lock);
> +       /* Don't use 0 */
> +       error = idr_get_new_above(&cgroup_idr, newid, 1, &myid);
> +       spin_unlock_irq(&cgroup_idr_lock);
> +       if (error)
> +               goto err_out;

This code is pretty similar to a big chunk of cgrouproot_setup_idr() -
can they share the common code?

> +static void cgroup_id_attach(struct cgroup_id *cgid,
> +                            struct cgroup *cg, struct cgroup *parent)
> +{
> +       struct cgroup_id *parent_id = rcu_dereference(parent->id);

It doesn't seem as though it should be necessary to rcu_dereference()
parent->id - parent can't be going away in this case.

> +       int i;
> +
> +       cgid->depth = parent_id->depth + 1;
> +       /* Inherit hierarchy code from parent */
> +       for (i = 0; i < cgid->depth; i++) {
> +               cgid->hierarchy_code[i] =
> +                       parent_id->hierarchy_code[i];
> +               cgid->hierarchy_code[cgid->depth] = cgid->id;

I think this line is supposed to be outside the for() loop.

Paul

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2008-12-05 11:11 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-12-05  8:26 [RFC][PATCH 0/4] cgroup ID and css refcnt change and memcg hierarchy (2008/12/05) KAMEZAWA Hiroyuki
2008-12-05  8:28 ` [RFC][PATCH 1/4] New css->refcnt implementation KAMEZAWA Hiroyuki
2008-12-05  9:39   ` Paul Menage
2008-12-05 11:24     ` KAMEZAWA Hiroyuki
2008-12-10  9:00       ` Paul Menage
2008-12-10 10:23         ` KAMEZAWA Hiroyuki
2008-12-05  8:29 ` [RFC][PATCH 2/4] cgroup ID KAMEZAWA Hiroyuki
2008-12-05 11:11   ` Paul Menage [this message]
2008-12-05 11:50     ` KAMEZAWA Hiroyuki
2008-12-05  8:31 ` [RFC][PATCH 3/4] memcg: hierachical reclaim KAMEZAWA Hiroyuki
2008-12-05  8:32 ` [RFC][PATCH 4/4] fix oom kill under hierarchy KAMEZAWA Hiroyuki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6599ad830812050311m3728ab69v465ed5d032792973@mail.gmail.com \
    --to=menage@google.com \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lizf@cn.fujitsu.com \
    --cc=nishimura@mxp.nes.nec.co.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox