From: Paul Menage <menage@google.com>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
"nishimura@mxp.nes.nec.co.jp" <nishimura@mxp.nes.nec.co.jp>,
"balbir@linux.vnet.ibm.com" <balbir@linux.vnet.ibm.com>,
"lizf@cn.fujitsu.com" <lizf@cn.fujitsu.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [RFC][PATCH 2/4] cgroup ID
Date: Fri, 5 Dec 2008 03:11:23 -0800 [thread overview]
Message-ID: <6599ad830812050311m3728ab69v465ed5d032792973@mail.gmail.com> (raw)
In-Reply-To: <20081205172959.8285271f.kamezawa.hiroyu@jp.fujitsu.com>
Hi Kamezawa,
I definitely agree with the idea of being able to traverse the cgroup
hierarchy without doing a cgroup_lock() and I've included some
comments below. But having said that, maybe there's a simpler
solution?
A while ago I posted some patches that added a per-hierarchy lock
which could be taken to prevent creation or destruction of cgroups in
a given hierarchy; it was lighter-weight than the full cgroup_lock().
Is that sufficient to avoid the deadlock that you mentioned in your
patch description?
The idea of having a short id for each cgroup to save space in the
swap cgroup sounds sensible - but I'm not sure that we need the RCU
support to make the id persist beyond the lifetime of the cgroup
itself.
On Fri, Dec 5, 2008 at 12:29 AM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
>
> +/*
> + * Cgroup ID for *internal* identification and lookup. For user-land,"path"
> + * of cgroup works well.
> + */
This comment seems misplaced and possibly unnecessary. Should it be
with the struct cgroup_id definition in cgroup.c?
>
> +/*
> + * For supporting cgroup lookup and hierarchy management.
> + */
A lot more commenting would be useful here.
> +/* An interface for usual lookup */
> +struct cgroup *cgroup_lookup(int id);
> +/* get next cgroup under tree (for scan) */
> +struct cgroup *
> +cgroup_get_next(int id, int rootid, int depth, int *foundid);
> +/* get id and depth of cgroup */
> +int cgroup_id(struct cgroup *cgroup);
> +int cgroup_depth(struct cgroup *cgroup);
> +/* For delayed freeing of IDs */
> +int cgroup_id_tryget(int id);
> +void cgroup_id_put(int id);
> +
> #else /* !CONFIG_CGROUPS */
>
> /*
> + * CGROUP ID
> + */
More comments needed about the exact semantics of these fields.
> +struct cgroup_id {
> + struct cgroup *myself;
Can you call this cgroup for consistency with other struct cgroup pointers?
> + unsigned int id;
> + unsigned int depth;
> + atomic_t refcnt;
> + struct rcu_head rcu_head;
> + unsigned int hierarchy_code[MAX_CGROUP_DEPTH];
How about "stack" for this array?
> +};
> +
> +void free_cgroupid_cb(struct rcu_head *head)
> +{
> + struct cgroup_id *id;
> +
> + id = container_of(head, struct cgroup_id, rcu_head);
> + kfree(id);
> +}
> +
> +void free_cgroupid(struct cgroup_id *id)
> +{
> + call_rcu(&id->rcu_head, free_cgroupid_cb);
> +}
> +
Rather than having a separate RCU callback for the cgroup_id
structure, how about marking it as "dead" when you unlink the cgroup
from the tree, and freeing it in the cgroup_diput() callback at the
same time the struct cgroup is freed? Or is the issue that you need
the id to persist longer than the cgroup itself, to prevent re-use?
> +static DEFINE_IDR(cgroup_idr);
> +DEFINE_SPINLOCK(cgroup_idr_lock);
Any reason to not have a separate idr and idr_lock per hierarchy?
> +
> +static int cgrouproot_setup_idr(struct cgroupfs_root *root)
> +{
> + struct cgroup_id *newid;
> + int err = -ENOMEM;
> + int myid;
> +
> + newid = kzalloc(sizeof(*newid), GFP_KERNEL);
> + if (!newid)
> + goto out;
> + if (!idr_pre_get(&cgroup_idr, GFP_KERNEL))
> + goto free_out;
> +
> + spin_lock_irq(&cgroup_idr_lock);
> + err = idr_get_new_above(&cgroup_idr, newid, 1, &myid);
> + spin_unlock_irq(&cgroup_idr_lock);
> +
> + /* This one is new idr....*/
> + BUG_ON(err);
There's really no way this can fail?
> +/*
> + * should be called while "cgrp" is valid.
> + */
Can you be more specific here? Clearly calling a function with a
pointer to an object that might have been freed is a bad idea; if
that's all you mean then I don't think it needs to be called out in a
comment.
> +static int cgroup_prepare_id(struct cgroup *parent, struct cgroup_id **id)
> +{
> + struct cgroup_id *newid;
> + int myid, error;
> +
> + /* check depth */
> + if (parent->id->depth + 1 >= MAX_CGROUP_DEPTH)
> + return -ENOSPC;
> + newid = kzalloc(sizeof(*newid), GFP_KERNEL);
> + if (!newid)
> + return -ENOMEM;
> + /* get id */
> + if (unlikely(!idr_pre_get(&cgroup_idr, GFP_KERNEL))) {
> + error = -ENOMEM;
> + goto err_out;
> + }
> + spin_lock_irq(&cgroup_idr_lock);
> + /* Don't use 0 */
> + error = idr_get_new_above(&cgroup_idr, newid, 1, &myid);
> + spin_unlock_irq(&cgroup_idr_lock);
> + if (error)
> + goto err_out;
This code is pretty similar to a big chunk of cgrouproot_setup_idr() -
can they share the common code?
> +static void cgroup_id_attach(struct cgroup_id *cgid,
> + struct cgroup *cg, struct cgroup *parent)
> +{
> + struct cgroup_id *parent_id = rcu_dereference(parent->id);
It doesn't seem as though it should be necessary to rcu_dereference()
parent->id - parent can't be going away in this case.
> + int i;
> +
> + cgid->depth = parent_id->depth + 1;
> + /* Inherit hierarchy code from parent */
> + for (i = 0; i < cgid->depth; i++) {
> + cgid->hierarchy_code[i] =
> + parent_id->hierarchy_code[i];
> + cgid->hierarchy_code[cgid->depth] = cgid->id;
I think this line is supposed to be outside the for() loop.
Paul
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2008-12-05 11:11 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-12-05 8:26 [RFC][PATCH 0/4] cgroup ID and css refcnt change and memcg hierarchy (2008/12/05) KAMEZAWA Hiroyuki
2008-12-05 8:28 ` [RFC][PATCH 1/4] New css->refcnt implementation KAMEZAWA Hiroyuki
2008-12-05 9:39 ` Paul Menage
2008-12-05 11:24 ` KAMEZAWA Hiroyuki
2008-12-10 9:00 ` Paul Menage
2008-12-10 10:23 ` KAMEZAWA Hiroyuki
2008-12-05 8:29 ` [RFC][PATCH 2/4] cgroup ID KAMEZAWA Hiroyuki
2008-12-05 11:11 ` Paul Menage [this message]
2008-12-05 11:50 ` KAMEZAWA Hiroyuki
2008-12-05 8:31 ` [RFC][PATCH 3/4] memcg: hierachical reclaim KAMEZAWA Hiroyuki
2008-12-05 8:32 ` [RFC][PATCH 4/4] fix oom kill under hierarchy KAMEZAWA Hiroyuki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6599ad830812050311m3728ab69v465ed5d032792973@mail.gmail.com \
--to=menage@google.com \
--cc=balbir@linux.vnet.ibm.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lizf@cn.fujitsu.com \
--cc=nishimura@mxp.nes.nec.co.jp \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox